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Abstract 



This thesis explores the topics of parameter estimation and model reduction in the 
context of quantum filtering. The last is a mathematically rigorous formulation of 
continuous quantum measurement, in which a stream of auxiliary quantum systems 
is used to infer the state of a target quantum system. Fundamental quantum uncer- 
tainties appear as noise which corrupts the probe observations and therefore must be 
filtered in order to extract information about the target system. This is analogous 
to the classical filtering problem in which techniques of inference are used to process 
noisy observations of a system in order to estimate its state. Given the clear simi- 
larities between the two filtering problems, I devote the beginning of this thesis to a 
review of classical and quantum probability theory, stochastic calculus and filtering. 
This allows for a mathematically rigorous and technically adroit presentation of the 
quantum filtering problem and solution. 
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Given this foundation, I next consider the related problem of quantum parame- 
ter estimation, in which one seeks to infer the strength of a parameter that drives 
the evolution of a probe quantum system. By embedding this problem in the state 
estimation problem solved by the quantum filter, 1 present the optimal Bayesian es- 
timator for a parameter when given continuous measurements of the probe system 
to which it couples. For cases when the probe takes on a finite number of values, I 
review a set of sufficient conditions for asymptotic convergence of the estimator. For 
a continuous-valued parameter, I present a computational method called quantum 
particle filtering for practical estimation of the parameter. Using these methods, I 
then study the particular problem of atomic magnetometry and review an experimen- 
tal method for potentially reducing the uncertainty in the estimate of the magnetic 
field beyond the standard quantum limit. The technique involves double-passing a 
probe laser field through the atomic system, giving rise to effective non-linearities 
which enhance the effect of Larmor precession allowing for improved magnetic field 
estimation. 

I then turn to the topic of model reduction, which is the search for a reduced 
computational model of a dynamical system. This is a particularly important task 
for quantum mechanical systems, whose state grows exponentially in the number of 
subsystems. In the quantum filtering setting, 1 study the use of model reduction in 
developing a feedback controller for continuous-time quantum error correction. By 
studying the propagation of errors in a noisy quantum memory, I present a compu- 
tation model which scales polynomially, rather than exponentially, in the number 
of physical qubits of the system. Although inexact, a feedback controller using this 
model performs almost indistinguishably from one using the full model. I finally re- 
view an exact but polynomial model of collective qubit systems undergoing arbitrary 
symmetric dynamics which allows for the efficient simulation of spontaneous-emission 
and related open quantum system phenomenon. 
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Chapter 1 



Introduction 



A striking feature of quantum mechanics is its inherent uncertainty. Even when 
given a complete description of a system, quantum mechanics generally prescribes 
probabilities for measurement outcomes when a corresponding classical theory pre- 
scribes certainties. Given that quantum mechanics is a fundamental theory, one 
might suspect that quantum uncertainty significantly restricts our ability to accom- 
plish physical tasks. Yet rather remarkably, quantum information theory shows that 
this is not always the case. In fact, there are many tasks for which a quantum system 
significantly outperforms its classical counterpart, most notably quantum algorithms 
for factoring [Shor 1994] and searching [Grover 1996] , quantum protocols for commu- 
nication [Bennett and Brassard 1984; Bennett et al. 1993] and quantum techniques 
for precision measurement [Xiao et al. 1987]. 

Yet the need to cope with uncertainty is not unique to quantum systems. In- 
deed noise is nearly ubiquitous in any real world situation, when it is impractical 
or impossible to exactly describe the physics of the environment surrounding the 
system of interest or even the details of the system itself. Such uncertainty gives 
rise to a stochastic, rather than deterministic, description of a system and of the 
corresponding measurement process. Again, it is perhaps startling that in the face 
of uncertainty, one can still perform tasks remarkably well, although we experience 
such performances whenever we fly on an airplane, turn on a computer or purchase 
the correct birthday gift for a loved one. 
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Over the past century, the fields of stochastic control and estimation theory have 
made great strides in formalizing techniques for overcoming the presence of noise. 
One such technique is filtering, which is a method for estimating the state of a 
stochastic system by appropriately processing noisy measurements of that system 
[Lipster and Shiryayev 1977]. Another technique is feedback, in which one seeks 
to control a stochastic system to achieve a particular goal [Zhou et al. 1996]. Not 
surprisingly, the two are intimately related, in that deciding a feedback policy often 
first requires filtering the noisy measurements to determine exactly what the system is 
doing under all that noise. In building a mathematical apparatus for handling noise, 
these theories are broadly applicable across a variety of engineering and scientific 
disciplines. As our technological capability to manipulate and measure distinctly 
quantum systems matures along with the host of quantum information processing 
tasks we seek to perform, it is clear that control and estimation techniques will play 
an important role in the quantum engineering realm. 

Certainly, the goals of quantum control and estimation are no different than 
those for classical systems; primarily, the capability to build robust and stable sys- 
tems which accomplish a desired task. However, the engineering difficulties are more 
fundamental in the quantum case — one must isolate a quantum system from its envi- 
ronment in order to preserve quantum coherence and manipulate intrinsic quantum 
uncertainties, yet the isolation cannot be so severe as to preclude useful external 
interactions required for controlling and measuring the system. Dealing with these 
inimical demands at an abstract level is well appreciated in quantum information 
theory, particularly in the areas of quantum error-correction [Gottesman 1997] and 
quantum fault-tolerance [Aharonov and Ben-Or 1996]. Less abstractly, a plethora of 
robust methods have been developed for spin control and nuclear magnetic resonance 
applications [Vandersypen and Chuang 2004], where both fundamental quantum un- 
certainty and technical noise play important roles. Some of the methods of classical 
control and estimation theory appear implicitly in both of these quantum engineering 
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approaches, but given the success and relative maturity of the classical methods, it 
seems prudent to make the analogy more explicit, mining the vast library of known 
classical techniques which can then be suitably modified to reflect the constraints 
imposed by the laws of quantum mechanics. Even more simply, putting quantum 
control and estimation theories in the language of their classical progenitors pro- 
vides an elegant and technically convenient way to decompose and study a quantum 
engineering problem. 

Such a reformulation is exceptionally useful in the domain of quantum optics, 
where statistical properties of laser light map rather directly onto the classical 
stochastic formalism. Belavkin was one of the first to flesh out this mapping, devel- 
oping a rigorous theory of quantum filtering and control [Belavkin 1979; 1987; 1999] 
in terms of the axiomatic probability theory and optimal control formalism used 
when dealing with classical stochastic systems, suitably adapted to the quantum 
domain by Hudson and Parthasarathy [1984]. As experimental prowess and poten- 
tial applications grew, Belavkin's filtering techniques were independently discovered 
in a more heuristic approach called quantum trajectory theory [Carmichael 1991]. 
Initially, quantum trajectories were seen as a computational tool for simulating the 
dynamics of open quantum systems, averaging over many stochastic quantum jump 
evolutions to simulate quantum master equation dynamics. This soon evolved into a 
theory of continuous quantum measurement and feedback [Wiseman 1994], which in 
conjunction with a renaissance of the earlier filtering work [Bouten et al. 2007a] and 
a closing gap between theoretical possibilities and experimental realities [Mabuchi 
et al. 1999], suggest quantum control theory and quantum optics are a useful pair for 
exploring quantum control applications, including precision metrology [Armen et al. 
2002; Geremia et al. 2003] and quantum error correction [Ahn et al. 2002]. 

It is within this propitious environment that my own research in quantum filtering 
and control has developed, largely along two main threads^ — quantum parameter es- 

third research thread that doesn't fit within the quantum filtering and control um- 
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timation and quantum model reduction. The former is essentially a filtering problem, 
in which one seeks to estimate a parameter that modulates the dynamical evolution 
of a probe quantum system. With knowledge of the dynamics, suitable measurements 
of the probe system can be used to determine the parameter of interest. However, 
the inherent quantum fluctuations in the probe measurement appear as noise which 
corrupts the probe signal, requiring statistical inference or filtering to best estimate 
the parameter. In Chapter 4, I review my work on developing a general filter for 
quantum parameter estimation via continuous quantum measurement [Chase and 
Geremia 2009b]. By embedding the parameter estimation problem in the state esti- 
mation problem of quantum filtering, I develop the optimal Bayesian parameter filter 
and discuss conditions for its convergence to the true parameter value. I also discuss 
an approximate computational method called quantum particle filtering suitable for 
practical quantum parameter estimation. In Chapter 5, I review the application 
of these techniques for a proposed experimental demonstration of precision magne- 
tometry [Chase et al. 2009a; Chase and Geremia 2009a; Chase et al. 2009b]. By 
double-passing an optical field through an atomic system, one hopes to create ef- 
fective nonlinear interactions which offer improved sensitivity to the strength of an 
external magnetic field. Determining the magnetic field strength from measurements 
of the scattered optical field is precisely the filtering problem discussed above. Al- 
though a careful derivation of approximate quantum Kalman filters using the method 
of projection filtering shows no improvement, numerical simulations of the exact dy- 
namics and quantum particle filters suggest an improvement does exist. By studying 
this example, I hope to demonstrate that the quantum filtering formalism provides 
an elegant framework for studying parameter estimation problems. 

The second topic of model reduction deals with developing a computationally 
reduced description of a quantum dynamical system, whose most general description 
grows exponentially with the number of subsystems involved. In practice, one is 

brella is work I did with Andrew Landahl on the computational universahty of quantum 
walks in one spatial dimension [Chase and Landahl 2008]. 
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oftentimes only interested in the dynamics of a restricted set of observables or a 
restricted set of initial states, both of which may not require calculating the exact 
dynamics. I review such a case in Chapter 6, presenting a reduced model of error 
propagation in a continuously measured quantum memory subject to noise [Chase 
et al. 2008]. The model is then used by a classical feedback controller to perform 
continuous error correction, with almost no change in performance relative to an 
exact model. The reduced description scales only polynomially in the number of 
physical qubits, an improvement over the exponential scaling of the exact model. 
Similar reductions will generally be useful for feedback controllers and filters which 
are usually processed on a classical computer. In Chapter 7, I present similar but 
unrelated model reduction research, describing a poljTiomial but exact model of 
collective dynamical processes of ensembles of qubits [Chase and Geremia 2008]. 
This allows for efficient numerical simulation of a broad range of collective qubit 
systems, particularly those involving spontaneous emission. 

But before delving into my own research, I begin in Chapters 2 and 3 by review- 
ing some essential elements of classical and quantum probability theory, stochastic 
calculus and filtering. The goal is to provide a "user's guide" to the existing body of 
mathematical physics literature, occasionally delving into the mathematical details, 
but focusing more on the tools needed for quantum control and filtering problems. 
There are several reasons for such an exposition. Firstly, it has been my experience 
that these methods are underappreciated in the quantum optics community, perhaps 
out of apprehension towards the mathematical rigor and language involved when 
more familiar quantum information approaches seem to suffice. However, climbing 
the seemingly steep initial learning curve quickly provides rewards in the form of 
an elegant and oftentimes superior approach for studying quantum continuous mea- 
surement and control problems. Secondly, there are technical reasons for preferring 
the rigorous results, especially due to mathematical issues inherent with continuous 
stochastic processes which include singular white noise terms. I believe these issues 
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can be appreciated without fully detailing the mathematical technicalities involved, 
which I certainly do not claim to master. Lastly and perhaps most importantly, I 
earnestly believe that taking the rigorous approach is the key to opening the vast li- 
brary of existing classical control and estimation tools which will allow for significant 
and rapid progress in the field of quantum engineering. 
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Chapter 2 

Classical Probability and Filtering 



Many scientists are familiar with the basic elements of probability theory — distributions, 
expectations, random variables — and are quite comfortable performing calculations 
using these elements. Given a fair, six-sided die whose faces are labeled 1-6, we 

are comfortable stating that the probability of rolling any particular face is |. The 
probability of rolling a face with an even number is also easily calculated as 

Pfroll is even) = V P(roll face i) = 3 x ^ = ^. (2.1) 
^-^ 6 2 

ie{2,4,6} 

This suggests the general rule "The probabihty of obtaining some set of mutually 
exclusive outcomes is the sum of the probabilities of each of the outcomes" , or math- 
ematically, for a collection of n disjoint sets . . . ^ the rule is 

P(AU...U>1„) = 5;]P(A)- (2.2) 

i 

We are also familiar with the uniform distribution on the interval [0, 1] . Given a 
random variable X with such a distribution, we know that P(0 < ^ < 1) = 1, i.e. 
that the random variable will take on some value on that interval. More generally, 
for < a, 6 < 1, we have 

P{a<X <h)^h-a, (2.3) 

which also correctly calculates the probability of a point, i.e. P{X = a) = P{a < 
X < a) = 0. Given our understanding of intervals, we then have 

P(- < X <-) ^ P(- < X <-) + P(- < X <-) ^- (2.4) 
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which is reminiscent of our general rule in Eq. (2.2). We might expect this rule 
to extend to an uncountably infinite number of disjoint sets, which for the entire 
uniform distribution implies 

P(0 < X < 1) = P{X = x) 

^6[o,i] (2.5) 

1 = 0. 

Clearly, something just went wrong with trying to extend our rule to an uncount- 
ably infinite number of disjoint sets. This turns out to be the case in many situations, 
where simply applying the intuitive discrete probability rules in the continuous case 
gives ridiculous answers. Oftentimes, its not even clear how to formulate questions 
using our intuitive rules, such as for the uniformly distributed variable X above, 
what is the probability that it takes on a rational value? 

Given our ultimate interest in describing continuous random variables, especially 
uncountably infinite collections of random variables indexed by the continuous label 
time, it is important that we use a probability theory that deals with these com- 
plications carefully. Indeed, the filtering problem is to consider the system(Xt)/ 
observations (Yt) pair 
dXt 



dt 
dYt 

dt 



f{t,Xt)+g{t,Xt) X "noise" (2.6) 
f{t,Xt)+g{t,Xt) X "noise" (2.7) 

(2.8) 



and perform inference about the state of the system based on the measurements. 
Since the "noise" terms are stochastic, both the system and observations are precisely 
the uncountable collections of random variables we need to consider. In fact, much 
care will be taken to define the "noise" terms in a mathematically sensible manner 
so that the filtering problem can be posed in a sensible fashion. 

All of these details require a carefully laid mathematical foundation in terms of 
axiomatic probability theory, formalized by Kolmogorov [1956] , which unifies features 
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of discrete and continuous probability in terms of measure theory. It also allows us 
to generalize probabilities to other spaces, including functional spaces which will 
be needed to describe stochastic processes. The first section in this chapter will 
overview some of the important properties of axiomatic probability theory, followed 
by a section on stochastic processes and white noise and a closing section devoted 
to solving the filtering problem. The presentation of topics in this section primarily 
follows [van Handel 2007], with added insight from [Geremia 2008; 0ksendal 2002; 
Wilhams 1991]. 



2.1 Classical Probability Theory 

The basic ingredient of probability theory is the sample space Q which describes the 
set of all possible outcomes in the probabilistic system under consideration. In the 
die example above, this would simply be = {1,2,3,4,5,6}, where the individual 
outcomes u ^ Q label the different faces of the die. While we could ask questions 
about individual outcomes, we are really more interested in related objects called 
events, which are the yes or no questions one could ask about the system. Such 
events are represented by subsets A G Q where the elements uj G A are those 
corresponding to a yes answer of the related question. For our example, the question 
"Did I roll an even number?" is represented by the subset {2, 4, 6} C and the 
basic question "Did I roll a 2?" is represented by the subset {2} C Q. The collection 
of such subsets, corresponding to the collection of relevant yes/no questions, is itself 
put into a set JF which is called the cx-algebra over Q. 

Definition 2.1. A a-algebra T over is a collection of subsets of which satisfies 
2. If the set A G T , then the complement G T 
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3. Countable (J^ A„ G JF if each An E T 



The first two requirements are not terribly surprising. Certainly the question, 
"Did anything happen?" must be vahd. Indeed, the most trivial cr-algebra valid for 
any D, is T — {0,fl}. Similarly, if a particular binary question is acceptable, A E 
we should tautologically be able to ask whether "not" of that question occurred, e.g. 
"Did I not roll a 2?" . This implies the complement A'^ G JF. The remaining and more 
technical requirement relates to our general rule from the beginning of the chapter. 
The intuitive idea is that for two events A,B G ^, we should be able to combine 
them to make the question "Did A or B happen?" (^4 U B) or the question "Did A 
and B happen?" (AnB). The restriction to countable "or" compositions^ prevents 
the pathological case we had above for elements on the real line and by taking it as 
an axiom, we can entirely avoid it. 

The pair {fl, JF} is a mathematical object called a measurable space and elements 
in JF arc called measurable sets. Such an object is defined precisely to sidestep the 
issues with uncountably infinite compositions. From the name, we anticipate that a 
measurable space is something we can define a measure on, which is just a convenient 
way to talk about sizes of collections of elements in Q. For a probabihty theory, we 
will want a specific measure P which assigns probabilities to events in sensible way. 
But the trick is to define the measure on sets in T and not directly on elements in 
Q, thereby only defining the measure on sets which are sensible without having to 
worry how those sets are composed from elements in Q. In other words, we need 
not worry about decomposing an event which should have a non-zero probability, 
e.g an interval, in terms of the its uncountably infinite constituents, which have zero 
probability, e.g. points. This is encapsulated in the following definition. 

Definition 2.2. A probability measure is a map P : ^ i— > [0, 1] which satisfies 

^Note that the composition of "and" questions comes from having the complement of 
sets in the cr-algebra. 
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1. For a countable collection {An : An E J-', AnHAm = for n 7^ m}, P(IJn ^n) = 

2. P(0) = 0, P(fi) = 1 

The first part of the definition is precisely our general rule, but restricted to 
countable collections. The second part is just to set the baseline meanings which we 
expect for any probability theory; that the probability of nothing happening is zero 
and the probability of anything happening is one. 

The tuple P) is called a probability space and formalizes the intuitive rules 

we desire such that they apply for both discrete and continuous spaces. In essence, 
the measure P is the workhorse, in that in encapsulates every probabilistic statement 
we make regarding the theory. As such, P is often referred to as the state of a 
random system and the probabilities it assigns to events are based on a physical 
model, counting, betting odds or whatever perspective lets you sleep at night. 

As a final introductory note, one might read that events A for which F{A) = 1 
are said to occur "almost surely", abbreviated a.s. This statement reflects the fact 
that sets of measure zero may contribute to an event, even though they individually 
have zero probability. 

2.1.1 Generated cr-algebras and the Borel cr-algebra 

For discrete spaces, the power set of Q is an obvious choice for the a-algebra, but 

it turns out (again) to be more complicated for continuous spaces^. For such spaces 

(and for later purposes), it is convenient to have a method for generating a valid 

JF from a collection of events we know we are interested in. Consider a potentially 

^For technical reasons beyond me, it turns out one can actually have too many sets in 
!F on which one can define a consistent P. Banach and Kuratowski [1929] actually showed 
that no probability measure exists on the power set of M such that the probability of any 
single point is zero. 
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uncountable collection of subsets JFq = {Ai G Q} which is not necessarily a a-algebra. 
In order to generate a a- algebra from JFq, we consider all a-algebras which have jFg 
as a sub-collection. Taking the intersection of these a-algebras also results in a a- 
algebra and is the smallest cr-algebra which contains all elements in JFq. The result 
of this operation, written JF = ajAj} is called the cr-algebra generated by JFq. 

Example 2.1 (Example 1.1.8 in van Handel [2007]). As a concrete example, consider 
the six-sided die for which we generate a cr-algebra from the questions "Did we throw 
a one?" and "Did we throw a four?" 



We see that a consistent cr-algebra implies that answering the two basic questions 
also allows us to answer questions such as "Did we throw a one or a four?" and 
"Did we not throw a one?". Really, the generated cr-algebra reflects all the yes/no 
questions we can logically answer from observing its input set of events, which here 
is knowledge of rolling a one or a four. 

An important cr-algebra for continuous spaces is the Borel cr-algebra (on the 
reals), defined as 

Definition 2.3. The Borel a-algebra (on M), written B, is the cr-algebra generated 
from the set of all open intervals on M. Note that this is a generated set, since 
the complement of open intervals is a closed interval, which is clearly not contained 
within the set of open intervals. 

2.1.2 Random Variables 



^{{1}, {4}} = {0, {1}, {4}, {ir, {4r, {1, 4}, {lAV, n}. 



(2.9) 



Although a probability space is all we need to start discussing a probabilistic sys- 
tem, we are ultimately interested in more glamorous inquiries than simple yes/no 
questions. As physicists, we are particularly interested in describing observations or 
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measurements we might make of the system, at which point we need to relate the 
labels on the measuring device to properties of the system. In order to make this 
mapping precise, we first introduce the following definitions. 

Definition 2.4. Let (i^,.^) and {S,S) be measurable spaces. The function X{uj) : 
O 1-^ 5" is an measurable function if X~^{S) = {a; e O : Xiuj) & S} & T for 
every S E S. 

Definition 2.5. An (S-valued) random variable is an (J^) -measurable function X{ijj) : 
Q,\-^ S from the probability space (Q, P) to the measurable space (-S", S). We will 
often consider real- valued random variables, which map elements in the sample space 
to (M, B) and which we will call simply random variables. 

The notion of measurability is what really allows us to define probabilities on 
random variables. In fact, if the random variable is .F-measurable, that means that 
all yes/no questions needed to determine its value are contained within .F, so that 
we need only invert the map X to determine the associated probabihty. That is, the 
probability for a random variable X to take on some value A E B is written 



where the first two forms are shorthand for the explicit form on the right. But it is 
conceivable that contains more yes/no questions than are actually needed for a 
particular random variable X. As such, we can consider the cr-algebra generated by 
a random variable 



This is actually a convenient way to generate a cr-algebra for Q, when we have a 
collection of random variables we are interested in; simply take the smallest cr-algebra 
which contains all those generated by each random variable in the set. 



P(X e A) = f{X-^{A)) = F{{uj e n : X{uj) e A}), 



(2.10) 



J^x^<j{X}^{X-\A):AeB} 



(2.11) 
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Abstractly, J-'x encodes the information that we learn by measuring X. Such a 
notion will be important when we consider conditioning and inference, when it will 
be useful to relate cr-algebras generated from different random variables. 

Definition 2.6. For two random variables X and Y defined on the probability space 
{fl,J^,F), we say that Y is J-'x measurable (or simply X -measurable) if J-'y C J^x or 
equivalently, there exists a measurable function : M i-^ M such that Y — (j){X). 

Example 2.2. Consider the probability space for throwing two coins, given by 
Vt = {HH, TT, HT, TH} with J-' and P defined but unimportant for this example. 
Further, define a boolean random variable X by 



which is the parity of the two tosses. It is straightforward to see that J^x = 
{0, {HH, TT}, {HT, TH}, Q}. Also consider the random variable Y defined by 



which corresponds to the outcome of the first toss and which has a generated a- 
algebra J^y = {0, {HH, HT}, {TH, TT}, Q}. We immediately see that Y is not X 

measurable as well as the opposite, though mcasurability need not be symmetric. 
This is completely sensible, as learning the outcome of the first toss is not enough 
information to determine the parity of the two tosses together. 

Related to measurability is the notion of independence: 

Definition 2.7. Two random variables X, Y defined on a probability space {fl, T , P) 
are independent if F{A n B) ^ F{A)F{B) for all AeJ^x,Be Ty- 

In contrast to measurability, in which one variable can be determined exactly by 
knowing the value of the other, independent variables share no information. That 
is, knowing the value of X tells you absoutely nothing about the value of Y. Note 



X{HH) = X{TT) = 1 



X{HT) = X{TH) = 0, 



(2.12) 



Y{HH) = Y{HT) = 1 Y{TH) = Y{TT) = 0, 



(2.13) 
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that independence is a property of the probabihty measure P, whereas measurabihty 
only depends on the structure of the cr-algebras generated by the random variables. 
Additionally, just because a random variable is not measurable with respect to an- 
other, the two are not necessarily independent. This is generally the case we will be 
interested in for filtering, when we learn partial information about related random 
variables when given the value of a particular one. 

Note that every random variable induces a probability measure on the reals given 

by 

l^x{B)=¥{X-\B)), BeB (2.14) 

We call /ix the distribution of the random variable X. A particularly important and 
familiar random variable is a Gaussian random variable X : Q i— > R with mean /i 
and variance has the distribution 

MB)^ I -^eM-^-^^)dx. (2.15) 

Definition 2.8. A very useful random variable is the indicator function : Q i— > 
[0, 1], defined for A e to be 



Xa{^) = { . (2.16) 




We can use indicator functions to rewrite a general random variable X over the sets 
Si on which it is constant, which provide a partition of fl, e.g. X{u G Si) = Xi where 
\J^Si = Vt. We then have 

X{u)^Y.'^,XsM- (2-17) 

i 

As we shall see in the following section, indicator functions are useful as they allow 
us to work exclusively with expectations, rather than directly with the probability 
measure, since for some A & '^{A) — E(xa)- This will allow us to gloss over 
conditional probability and focus instead on conditional expectations, which are more 
relevant for filtering. 
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2.1.3 Expectation 

The notion of expectation is another topic most are famihar with from previous work 
with probabihty. Conceptually, it corresponds to the average value of a random 
variable one would expect in the limit of repeating many trials of the underlying 
probability experiment. For simple random variables X which take on a finite number 
of values Xi,X2, ■ ■ ■ , Xn, the expectation above reduces to the familiar form 



where the expectation is well-defined so long as the possible values are finite. For a 
continuous-valued random variable X, we define it be a nondecreasing sequence Xn 
which converges to X and set E [X] = lim„^oo E [X„]. One can show [Williams 1991] 
that such a procedure uniquely converges to the following definition. 

Definition 2.9. Let {Q, JF, P) be a probability space with a random variable X. The 
expectation of X with respect to the measure P is 



where the integral is interpreted in the Lesbesgue sense. 

The fact that we extend to the continuous case via the integral above should come 
as no surprise as that is how we extend sums to the familiar Riemann integral in 
calculus. But given that probability theories are defined on more general measurable 
spaces, we use a more general integral — the Lebesgue integral, which allows us to 
integrate measurable functions (via P), unlike the Riemann integral which only allows 
us to integrate continuous functions. 

When studying stochastic processes, we will find it very useful to refer to the 
following classification of random variables in terms of their expectation. 



Definition 2.10. For a random variable X and p > I, define \\X\\p = (E{\X\'p)Y^p. 
A random variable is p-integrable if ||X||p < oo. For p = 2, such a random variable 




(2.18) 




(2.19) 
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is square-integrable. A random variable satisfying |X| < K for some K G M is called 
bounded and ||X||oo is the smallest K which bounds X. 

Using this definition, we can introduce the spaces £^(0, J-", P) = {X : ||X||p < oo} 
which are common spaces in functional analysis. Of particular use is the space 
which for n = M is almost^ the familiar space of square-integrable functions. As 
such, we will often make use of the implied inner product 

{X,Y)=E[XY]= f X{uj)Y{uj)'¥{duo) (2.20) 
Jo. 

which will allow for an intuitively pleasing interpretation of the conditional expecta- 
tion as an orthogonal projection. 



2.1.4 Conditioning 

Given all the above groundwork, we are now ready to tackle the important task of 
conditioning. As hinted at above, we will focus on conditional expectation, since the 
rules of conditional probability are readily recovered using indicator functions. To 
get a feel for things, and to appreciate the need for the more technical machinery to 
come, we begin with a straightforward definition for discrete spaces. 

Definition 2.11. For a probability space (f2,jF, P), consider the discrete random 
variables X and Y . Suppose Y yields a finite partition of f2 (as in Eq. (2.17)) in 
terms of sets for /c = 1, . . . , n. Then the conditional expectation of X given Y is 

(2-21) 

where p^^^^''^ is arbitrary if P(Afc) = 0. 



■^11 -112 is not quite a norm because ||X||2 = only implies that X = under the measure 
not that the function is identically X{ijj) = for all u. 
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How do we interpret this definition? Firstly, we see that the conditional expecta- 
tion is simply another (discrete) random variable, expanded in terms of the indicator 
functions xa^ oi' written in "the basis" of Y. Also, note that the actual values Y 
takes on are irrelevant; we are only interested in them so far as they allow us to 
identify the different sets A^. The term '^^^^^''■^ averages X only over the events 
which correspond to A^, dividing by F{Ak) to renormalize for this subset. Note that 
there is an arbitrariness when F{Ak) = 0, since that event does not happen (a.s.). 
The averaging is done for each partitioning set Ak, so that once handed a particular 
value y of Y, the conditional expectation returns the value of X averaged over the 
appropriate partition for y. As we will soon make precise, E can be interpreted 

as the random variable which returns an estimate of X when given the value of Y. 

For the usual reasons, this simple definition needs work to be extended to the 
continuous case. Suppose Y were actually a real-valued random variable. It may 
not generate a finite partition of the continuous sample space Q, which may have 
uncountably infinite elements. More importantly, since F{Y = y) = for any point 
y on the real line, the arbitrary case above actually turns into a nightmare; if we 
were to take the partitions Ak to be points, then the entire conditional expectation is 
arbitrary! A healthy dose of measure theory shows that one can define the conditional 
expectation in terms of a sequence of approximating discrete versions (which proves 
existence and uniqueness), but the technicalities are not particularly enlightening for 
us (see van Handel [2007]). But it is important to know what definition ultimately 
works, so we will instead simply use the following (Kolmogorov) axiomatic definition. 

Definition 2.12. Let X be a random variable on {Q, T , P) and Q be any a-algebra on 
the sample space fi. The conditional expectation E [X\Q] is the unique ^-measurable 
random variable which satisfies E [xa-^] = E [xa^ [^I^]] A & Q. 

Rather than conditioning directly on a particular cr-algebra, we often instead con- 
dition on one generated by another random variable, as was done in the discrete 
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case above. As a short-hand, we write E to indicate the more precise form of 

E [X|jFy], where jFy is the a-algebra generated by the random variable Y. 

From the perspective of statistical inference, the following theorem shows that 
we can interpret the conditional expectation E [X\Q] as the best estimate of X, in a 
least-squares sense, given the information in Q. 

Theorem 2.1 (Proposition 2.3.3 in [van Handel 2007]). Given X and Q as in 
Def. 2.12, ¥.[X\Q] is the unique Q -measurable random variable that satisfies 

E [(X - E[X\g]f] = ^min E [(X - Y)'] , (2.22) 

where C^{Q) = {Y E : J-'y E Q}. We therefore call E [X\Q] the least-mean- square 
estimate of X given Q. 

We can actually interpret this statement as the orthogonal projection of X onto the 
linear subspace £^(7i) C with respect to the inner product in Eq. (2.20). 

Proof. For all Y G £^(^), we can write 

E [(X - Yf] = E [(X - E[X|^] + E [X\g] - Yf] (2.23) 

where A = E [X\Q] — Y is ^-measurable, by definition of the conditional expectation 
and Y . Rewriting, then 

E [(X - Yf] = E [(X - E[X|^] + A)2] (2.24) 
= E [(X - E[X|^])2] + 2E [A(X - E[X|^])] + E [A^] (2.25) 

But by the Kolmogorov definition of conditional expectation (Def. 2.12), we have 
E[AE[X|^]] =E[AX] (2.26) 

so that the middle term above is identically zero, leaving 

E [(X - Yf] = E [(X - ¥.[X\g]f] + E [A^] (2.27) 
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Since E [A^] > 0, the equation is minimized when A = 0, which is precisely the least- 
squares property. This coincides with the geometric intepretation, since if A G 
and E[X\g] is orthogonal projection of X onto C^iG), then X -E[X\g] 1. C^g and 
therefore (X - E [X\g] , A) = E [(X - E[X|^])A] = 0. 

If the conditional expectation were not unique, then there would exist some other 
^-measurable random variable Y' that also minimizes E [(X — F)^] over all Y . As 
demonstrated above, this would mean E [(X - Y'f] = E [(X - E[X|^])2]. But we 
could equally well write 

E [(X - Y'f] = E [(X - E[X|^])2] + E [(E[X|^] - Y'f] (2.28) 

where the cross term again disappears due to orthogonality. If Y' is truly a minimum, 
we must have E [(E[X|^] - Y'f] = or really F' = E [X\g] (a.s). □ 

2.1.5 Radon-Nikodym Theorem 

Another definition of conditional expectation is in terms of the Radon-Nikodym the- 
orem. Although the Kolmogorov definition is perfectly adequate for our purposes, 
studying this alternate definition will introduce concepts that are essential in devel- 
oping the filtering equations and will be revisited when studying the stability of the 
quantum parameter estimation filter in Chapter 4. 

Definition 2.13. Let (i7,jF, P) be a probability space. A probability measure Q is 
absolutely continuous with respect to P, written Q <^ P if Q(yl) = for all events 
A e where P(A) = 0. 

Absolute continuity is an important concept when we are interested in changing 
probability measures, which is essentially a change of variables technique to allow 
for easier calculations (much like the change of variables technique in calculus). The 
above definition tells us when such a change of variables is even possible. 
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The basic technique of transformation is as follows. Let /(a;) be a nonnegative 
random variable on {fl, T ^ P) satisfying E[/] = 1. For any A e .7^, we define the new 
measure Q as 



where Q satisfies the requirements of a probability measure, i.e. Q(0) = 0, Q(r2) = 1 
since E[/] = 1, and the countable disjoint sets decomposition follows directly from 
the definition of conditional expectation and the measure P. We can then relate the 
expectations under either measure for some other random variable g{uS) as 



The function / above is called the density of the measure Q with respect to the 
measure P and is written dQ/cflP. 

If we think for a little, we immediately see that independent of a choice of /, events 
which have probability measure zero under P must also have probability measure 
zero under Q — there is no / such that f¥{du;) can be non-zero if ¥{du;) — 0. This 
observation is formalized in the following theorem, for which we omit the proof. 

Theorem 2.2 (Radon-Nikodym) . Consider the measures F,Q on the measurable 
space (Q, such that Q <^ P, then there exists a unique T -measurable random 
variable f with Ep[/] = 1 such that Ep[x^/] = Q[^] for all A ^ T . We therefore 
call f the density or Radon-Nikodym derivative, dQ/ dF. 

Although the theorem simply formalizes our intuition, the important part is that if 
it exists, the Radon-Nikodym derivative is unique. In particular, if we were to follow 
the technical route, and define the conditional expectation as a sequence of finite 
approximations, we would find that it converges to 




(2.29) 




(2.30) 



E[x|jr] 



dQ\r 



Q{A)^F{xaX) 



(2.31) 



c^P|^' 
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where Q|jr indicates the measure is restricted to the a- algebra JF. Since Theorem 
2.2 shows that this derivative is unique, so too is the conditional expectation and we 
need not worry about the ambiguities leftover from extending the discrete definition. 

2.1.6 Summary 

Before moving on to stochastic processes, let's highlight what we have learned so far. 
Foremost is that dealing with continuous probability spaces is not a trivial extension 
of the intuitive rules most are familiar with. Fortunately, by defining probability 
spaces, random variables and expectations using measure theory, we can overcome 
most of the technical issues. As such, the basic definition of a probability space is 
in terms of the measurable space (^^,JF) and the measure P. The a-algebra T is 
used to encode the yes/no questions one could ask about the outcomes lo in the 
sample space fi. Random variables are one step up from the a-algebra, and provide 
a mapping of outcomes in Q to some other measurable space, on which a measure 
is induced via P. Essentially, random variables allow us to work with quantities of 
interest which are not simple yes/no questions regarding the original sample space Q. 
Measur ability tells us when one random variable's value is determined entirely by the 
value of another; independence tells us when random variables values are completely 
unrelated. 

From there, we introduced the concept of expectation^ which is the average value 
of a random variable expected after repeated sampling from the given probability 
model. Expectation induces an "almost" -inner product on the space of random 
variables. This picture provides a nice interpretation of conditional expectation, in 
which we create a new random variable E which returns the average of X when 

given the value of Y . This is equivalent to a least-squares projection, in terms of 
the aforementioned inner product, of X onto the space of jFy-measurable random 
variables. These basic ingredients will be important as we move on to consider more 
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complex probability concepts. 



2.2 Classical Stochastic Processes 



As we steadily move towards discussion of dynamic stochastic systems and the pro- 
cessing of stochastic signals, we will use the following definitions to imbue our pre- 
vious probability constructions with a notion of time. 

Definition 2.14. A stochastic process is a map 



where the argument t is interpreted as time. 

We see this is nothing more than a family of random variables labeled by the in- 
creasing and positive index t. For a given Ui G Q, the function Xt{u!i) traces out a 
trajectory in time. As time passes, so does our ability to answer yes-no questions 
about other events in and we should be able to partition the cr- algebra into ques- 
tions which may or not be answerable given the information we have now. For the 
die example, in which the stochastic process is repeated rolls, we can answer the 
question "Was each roll a one up to time ii?" at time ti and certainly no sooner. 
Additionally, once wc are able to answer this question, we should be able to do so for 
eternity. That is, there is no way we can "unlearn" information about events. Such 
a filtration of is formalized in the following definition. 

Definition 2.15. The elementary space {Q. T . P) admits a filtraMon in terms of an 
increasing sequence of cr-algebras, labeled J-'t G J- where C J-t for s <t. 



Xt{uj) -.R+xuj^R 



(2.32) 



Note that many filtrations exists on a probability space, though we are often in- 
terested in one generated by a particular stochastic process, which may be written 
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J^,"" ^a{X;\B),s<t} (2.33) 

Given this extra structure for the cr-algebra, measurabihty may also be defined 
relative to the passage of time. 

Definition 2.16. Consider the probability space {il,!F,F) with filtration !Ft. The 
stochastic process Xf is called ^t-adapted if Xg is ^t-measurable for every s < t. 

Adapted processes encompass most of the stochastic processes that we will consider 
in filtering theory. Intuitively, these processes are ones that do not look into the 

future, in that at time t, the values of the entire stochastic history up to that time, 
{Xs<t}, are completely determined by the yes-no questions answerable at time t, 
represented by 

An important class of stochastic processes are those whose future values are best 
estimated by its current value. Such processes are called martingales. 

Definition 2.17. A stochastic process Xt is an jF^-martingale if it is jT^-adapted, 
has bounded expectation (E < oo for all t) and satisfies E = Xs for all 

s<t. 

Martingales are perhaps best appreciated in terms of their etymological roots in 
gambling theory. If we let the stochastic process Xg represent our winnings at time s, 
then E [Xj jjF,] represents our expected future winnings at time t, given our knowledge 
of events up to time s < t. If the game is fair, which is in our best interests, but still 
worth playing, which is certainly in the best interests of the casino, this expectation 
should be Xg. That is, on average, we expect to come out even when playing the 
game. It turns out that this simple property has far reaching implications and is a 
powerful tool for proving other properties of stochastic processes. 
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2.2.1 White Noise and the Wiener Process 

For a given stochastic process Xt, it will be desirable to formulate an equation of 
motion which describes its time-evolution, in which a noise term traces out an indi- 
vidual trajectory or realization appropriate for a given probability measure. As an 
equation, we thus desire 



where I have intentionally been imprecise in representing the noise term. As we shall 
see, the noise term is mathematically difficult to handle in general and even in the 
particular case when noise is white'^ — delta-correlated in time with a flat power spec- 
trum. White noise is common in engineering and physics due its simple properties 
and fairly broad applications, from modeling random walks to financial derivative 
prices. Before formally developing a sensible equation of motion for processes driven 
by white noise, lets first consider an example which highlights the difficulties involved 
in simply formulating white noise as a stochastic process. 

Example 2.3 (From Introduction in van Handel [2007]). Consider a discrete-time, 
noisy channel, in which at time-step n, the message a„ is transmitted, e.g. x„ = 
«n + ^n- The noise ^„ can be assumed to be independent and identically distributed 
(i.i.d.) at different times, as the noisy channel quickly loses traces of its previous 
state. Moreover, if ^„ is really the sum of many independent effects, the central limit 
theorem suggests that it should be Gaussian distributed. We therefore can take 
to be discrete time Gaussian white noise with some mean and variance. 

Extending to a continuous-time model, our intuition tells us to replace the discrete 
label n with the continuous label t. Assuming zero-mean and unit-variance for the 

^Note that the "color" of the noise has to do with the correlation properties of a stochas- 
tic process; it says nothing about the distribution of the noise itself. For the most part, 
we will consider Gaussian white noise processes, which are delta-correlated in time with 
Gaussian distributed increments. 




(2.34) 
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noise, we then have E [^t] = and E [^s^t] = if s ^ t and E [^f] = 1. Now suppose 
we transmit a message gq as Xt = oo + Ct- Consider the time averaged process over 
a small interval [0, e] 

X,^-[ xtdt = ao + 5e (2.35) 



e JO 

where '^e — ^ Jq itdt. Clearly E [Hg] = 0, but more interestingly^ 

E [S^] = \ r Te [U,] dsdt = 0, (2.36) 
f Jo Jo 

Thus to decode the message, we simply time average Xt for an arbitrarily short 
amount of time. 

This is most likely not the model wc envisioned; wc expect some effort is needed 
to recover the corrupted message. Rather than working directly with ^t, we could 
instead focus on the time-averaged process. Clearly Si is a zero mean, Gaussian 
random variable with unit variance. If we want to retain the independence of noise 
at different times, this suggests that J^^^ ^tdt and Jy^ ^tdt are also independent, 
mean-zero random variables, but now with variance 1/2. Generalizing, we can then 
introduce the Wiener process 

Wt^ [\js. (2.37) 

We will formalize this slightly in a bit, but the idea is that Gaussian white noise is 
the time derivative of the Wiener process, dWt/dt. However, we will find that it is 
non-differentiable almost everywhere. Indeed, given that E [PVsW^t] = (s, t) due 
to the independence of different increments, we have 



^^^^ dtds ^ ^ dtds 2 dt 2 ^ 



(2.38) 



^Note that this goes to zero since i = s on a set of measure zero, so the expectation 
factors and goes to zero. Of course, one really expects to get a delta function here, but 
as we soon see, that has a different mathematical meaning than these real-valued random 
variables. 
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which is the Dirac delta "function", a manifestly non-differentiable object. Moreover, 
given that 6{t) is really a distribution and not a true function, we immediately see 
the difficulties in defining it as a stochastic process. Working through the details, 
one would find that our current probability framework does not allow for a stochastic 
process with the desired properties of white noise. Yet as we previously mentioned, 
delta-correlation is the most common definition used for white noise in engineering 
in physics. Fortunately, we will be able to use the Wiener process, which does have a 
rigorous mathematical definition, to formally handle a process like that in Eq. 2.34. 

We loosely want to think of the Wiener process as the N ^ oo limit of the 
random walk 

Nt ^ 



where ^„ are the i.i.d random variables with zero mean and unit variance. The idea 
is that in the infinite limit, the central limit theorem tells us that any sum of i.i.d 
random variables is Gaussian distributed. Unfortunately, that theorem does not 
apply for the entire stochastic process {xt{N) : t G which has uncountably 

infinite elements. But it is good enough for any finite number of elements from this 
collection, which allows us to define the Wiener process as follows. 

Definition 2.18. A Wiener process Wt is a stochastic process with continuous tra- 
jectories and which for any set of times ti < t2 < . ■ ■ < tn,n < oo, the increments 
Wt-i, — Wtj^, . . . , Wt„ — Wt^_^ are independent Gaussian random variables with 
zero mean and respective variances ti,t2 — ti, . . . ,tn — tn-i- 

It turns out that proving the existence of such a process is more involved than worth 
detailing for our purposes (see van Handel [2007, 3.2]). One can also show that the 
Wiener process is unique in the sense that any two processes Wt-, Vt which satisfy the 
above definition give rise to the same probability law, e.g. E [/(W^t)] = E It 




(2.39) 
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can also be shown that with unit probabihty, the sample paths of a Wiener process 
are continuous everywhere but differentiable nowhere. 

Given the above definition, there are some basic properties we can now consider. 
First is that the Wiener process Wt introduces a natural filtration = a{Ws : s < 
t}. Rclatedly, given an arbitrary filtration J-'t, we say Wt is an J-'t-Wiener process if 
it is adapted and Wt — Ws is independent of for any t > s. Two further properties 
are considered in the following lemmas. 

Lemma 2.1. An jFj- Wiener process is a jF^-martingale. 

Proof. Wc want to show that E [Wt\J^s] = Ws for t > s. Clearly Ws = E [Ws\J^s] since 
Ws is .T-'s-adapted, which allows us to rewrite the condition as E [Wt — Ws\J-s] — 0. 
But we just stated that Wt — Wg is independent of J^s and therefore has a zero 
conditional expectation. □ 

Definition 2.19. An J-'t-Markov process is an jF^-adapted process Xt such that 
E [f{Xt)\J^s] = E [f{Xt)\Xs] for alH > s and bounded/measurable /. 

Lemma 2.2. An .Fj- Wiener process is an .FfMarkov process. 

Physicists are very familiar with Markov processes which describe a statistical process 
with no memory. In the formal definition, this is manifest in that the expectation of 
any future function of the process depends only on the value of the process now. This 
is the same as saying the future statistical properties of the process are completely 
determined by its current value. It is certainly reassuring that Brownian motion, 
represented by the Wiener process, satisfies this property. 

2.2.2 The It 6 Integral 

In our steady march towards a mathematical model of dynamic stochastic processes, 
we are now ready to consider defining stochastic integrals of Gaussian white noise. 
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e.g. Jq fs^sds. Of course, given the discontinuity and non-differentiability of ^s, we 



at least continuous. An obvious approach would be in terms of the Stieltjes integral, 
which is an appropriate generalization of the Riemann integral to non-differentiable 
integrators. For our purposes, this means we define a sequence of refining partitions 
TTn of the time interval of integration [0, t] so that we may write 



where the ti make up the partition vr of [0,t]. It is certainly not clear that this 
limit converges and does so independently of the choice of partitions 7r„. This is 
especially worrisome given the non-differentiability of Wf. Perhaps as anticipated, 
a rigorous consideration shows that this stochastic integral formulation does not 
converge uniquely and depends sensitively on the choice of approximating sequence — 
there are actually examples where the sequence can be chosen so that the integral 
converges to any desirable function! 

The source of the troubles comes from the fact that the Wiener process has infinite 
total variation over any interval. Total variation is the total distance your finger 
would have to travel tracing out the contour of the Wiener process over the given 
interval. This is infinite for any interval. As a description of a physical process, this 
is clearly absurd! A particle undergoing Brownian motion would surely require an 
infinite amount of energy to travel an infinite distance. Of course, a Wiener process 
is an idealization of a true physical model, but this seemingly undesirable property is 
an important consequence of the properties of white noise that we do want to model 
(delta-correlated, martingale, Markov). Consider that even if the total displacement 
\f{t) — f{s)\,t > s is small, the function can still oscillate very rapidly within that 
interval to get a large total variation; the non-differentiable Wiener process therefore 
oscillates extraordinarily rapidly over any such interval. Loosely speaking, the whole 
trouble boils down to the fact that no matter how fine the partition tt, you don't get 



instead hope to define an integral over the Wiener process, e.g. fgdWs, which is 




(2.40) 
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any better handle on the Wiener increments; the infinite variation means you will 
never get a level of detail independent of the choice of partition. 

Fortunately, one can show that even though the total variation of a Wiener process 
is infinite, the quadratic variation is finite, e.g. for the interval [0, 1] and any sequence 
of partitions 

hm J2 - ^ 1 (2-41) 

n—^oo ' 

Thus rather than having the stochastic integral converge almost surely (a.s.), we can 
instead consider convergence in C^. More exactly, for some random variable X and 
a sequence we say that X„ X a.s. if P({cj G Q : X„(u;) X}) = 1. We 

say that X„ ^ X in £^ if ||X„ — X||2 asn oo. There are several types 
of convergence for sequence of random variables which are related in sometimes 
unintuitive ways. See van Handel [2007] for more discussion. 

Taking the approach, consider the simple, square-integrable ^j'^-adapted 
process X". The first two properties suggest there are a series of X < oo non- 
random jump times tj (though this could be relaxed) such that X^" is a constant 
jFj^-measurable random variable in C^. That is, since the stochastic process is "sim- 
ple", there are a finite number of times where it jumps to different values. For 
simplicity, we assume these times are the same for all u. The idea is to leverage the 
fact that more general Xj processes are limits of simple processes X" and if we can 
define the integral consistently for the latter, the former will inherit the definition. 

It is fairly straightforward to define a consistent integral for X": 

/(X") = / X^dWt = - ^u). (2.42) 

We now want to show that a sequence of such integrals will converge in to a 
particular integral for some Xj, independent of the approximations X". To do so, 
we make use of the following isometry. 
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Lemma 2.3 (Ito Isometry). Let X" be the simple, square-integrable, J-'^- adapted 
process discussed above. Then, 



(2.43) 











E 




= E 


rm'dt 








Jo 



Proof. 



E 



T N 2" 

X,^dWt 







5^E \xilX,]{Wu^, - WtMWt,^, - Wt^ 



(2.44) 



Now assume i ^ j. From the Wiener process definition (2.18) and properties, we 
know that disjoint increments are independent for disjoint time intervals and more- 
over, Wt — Ws is independent of JF, for any t > s. Without a loss of generality, 
assume ti > tj. Then (VFj^+i — Wt^) is independent of X^", which is jFj. -adapted, and 
independent of X^", which is JF^^. -adapted. Since it is also over a different interval 
than (M/fj+i — Wtj), we can factor its expectation completely and calculate it to be 
zero by definition. This leaves terms for which i = j, in which case we have 



E 



J2^[ix:f]E[{w,^^,-w,f] 

i 

=Y,^[{xif] (t,+i - 1,) = E [ r (x; 



'dt 



(2.45) 



Note that the fact that X" G is necessary for convergence to the final integral. □ 



Recall that an isometry is a distance preserving map between two metric spaces. 
The property under consideration is an isometry if we consider the process X" as 
a measurable map on [0,T] x which admits a natural product measure /ij- x P, 
where /iy is the Lebesgue measure on [0, T] which is simply T times the length of 
the interval. Using this definition, we see that the Ito Isometry can be written as 

||/(X")||2,P= ||X"||2,^^xP (2.46) 
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where the left-hand term is the norm on fl and the right-hand term is the C^- 
norm on [0, T] x fl. This isometry preserves the £^-distance for JF^*^^ -adapted simple 
integrands as 

ii/(x") - /(y:")ii2,P ^ - i:ii2,MTXP (2.47) 

But the beauty is that one can show^ for some X e >C^(//t x P) that there exists 
some sequence of simple integrands such that 



lim||X"-X|||^^,P = E 

n— >oo 



^(xr - x.fdt 

Jo 



(2.48) 



then I{X.) can be defined uniquely as the limit in £^(P) of the simple integrals 
/(X.")! This turns out to be true for any ^^^-adapted process and gives rise to the 
following definition of the stochastic integral. 

Definition 2.20. Consider the .7-"/^- adapted stochastic process Xt in C^^iit x P)- 
The ltd integral 

I{X)= [ XtdWt (2.49) 
is defined as the unique hmit in C^{¥) of simple integrals 7(X."). 

One can show that the Ito integral has continuous sample paths, is an .F^^- martingale 

and satisfies''' 



E 



/ Xtdw}= I E[XtdWt\^ [ E[Xt]E[dWt]^0, (2.50) 
Jo J Jo Jo 



where the fact that Xt is .F/^-adapted means it is independent of dWf (it is a non- 
anticipative function) and the expectation may be factored. 

In short, the Ito integral is defined uniquely as a converging sequence of approx- 
imations in jC^. whnv le\Trage the fact that simple stochastic processes converge 



^Essentially, one shows that the approximating sequence is a Cauchy sequence in jC^, 
after which convergence is easy. 

^These properties actually depend on localizing the Ito integral, which amounts to 
defining it on arbitrarily long time intervals. Extending to the infinite interval is difficult. 
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uniquely to show that the Ito integral also converges. The fact that it is limited to 
jFj'^-adapted random variables is not a significant restriction for us, especially given 
the resulting useful properties we recover, including having zero expectation and be- 
ing a martingale. Indeed, one approach towards the filtering problem is based on the 
following relationship between arbitrary martingales and Ito integrals. 

Lemma 2.4 (Martingale Representation). Consider the jF^^-martingale Mt G i2^(P). 
Then there exists a unique jF^^-adapted process Hf such that 



This lemma is extremely useful in that if we can show that some stochastic process is 
a martingale with respect to the Wiener filtration J-'^ , we know it can be expressed 
as an Ito integral. As we shall soon find in the following section, this is equivalent to 
showing that the process admits a stochastic differential equation analogous to the 
desired trajectory in (2.34). 

2.2.3 Stochastic Differential Equations 

We are now finally in a position to consider dynamical processes driven by white 
noise. The basic idea is that the time evolution of complicated stochastic processes 
can be expressed simply in terms of the basic Wiener process, whose statistics and 
properties are well known to us, and an appropriate deterministic term. This is 
often the route taken in statistical physics, where trajectories are written as Langevin 
equations. Unfortunately, the ordinary differential equation picture we had in mind 
in Eq. 2.34 is not useful, as there is no way to express Gaussian white noise directly 
as a sensible mathematical object. However, our success in defining the Ito integral 
suggests that we can deal sensibly with the integral of the noise process, written in 
terms of Wiener increments, which gives a form 




(2.51) 




(2.52) 
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But given our predilection for differential equations, we often express the above 
integral as a stochastic differential equation (SDE), written 

dXt = a{t, Xt)dt + Xt)dWt (2.53) 

where differentials are used to remind us that this is not a true derivative equation, 
as dWt/dt is not a well-defined mathematical object. The SDE form is really no more 
than a notational convenience for referring to the more accurate integral form. This 
convenience is most obvious when considering functions of such stochastic processes, 
for which the normal chain rule of calculus no longer holds. This is seen in the 
following example and theorem. 

Example 2.4 (Based on [van Handel 2007, Chap. 4]). During our introductory cal- 
culus course, we are quickly inculcated with algebraic rules for evaluating derivatives 
and integrals of a variety of functional forms. One familiar rule is for powers and 
reads 



XtdXt = / udu = — 
Jxo 2 



(2.54) 

Xo 



Does this hold if Xt = Wt^ We can check by explicitly calculating the integral. 
Given that the Ito integral is defined in terms of a convergent sequence in we 
take the approximating simple versions of Wt to be Wt taken at jump times given 
by dyadic rationals. We will not show it, but such an approximation does converge 
to Wt appropriately and is therefore a valid expansion of the stochastic integral. 
Writing this out, we have 



/ WtdWt = lim V l^fc2-"T(W^{fc+i)2-T - Wk2--T) (2.55) 

(2.56) 



£2 lim - 

n— >oo 2 



fc=0 

2"-l 

2 



k=0 

where we have simply rearranged terms in the sum. We note that the second term 
converges in to the total quadratic variation, so that 

/ WtdWt = l[W^-T]. (2.57) 

.In 2 
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But this is not the same as the famihar calculus rule in Eq. (2.54), which indicates 
(noting Wo = 0), 

^ 1 

WtdWt = -W^. (2.58) 

Clearly, the Ito integral is a more complicated beast. Fortunately, the following 
theorem shows that only a slightly modified chain rule is needed. 

Theorem 2.3 (Ito Rule, one dimension). Consider the stochastic process Xt with 
stochastic differential equation 

dXt = a{t, Xt)dt + 6(t, Xt)dWt (2.59) 

Now consider a function f{t,Xt) that is differentiahle with respect to its first argu- 
ment and twice differentiahle with respect to its second. This function then satisfies 
the stochastic differential equation 



dt 



+ h{fX,)^-I^^dW, (2.61) 

where higher order differentials were evaluated according to dtdWt = dt"^ = and 
dW^ = dt. 

Lemma 2.5 (Ito product rule). The Ito product rule for stochastic processes 
Xt,Yt is 

d{XtYt) = dXtYt + XtdYt + dXM. (2.62) 

Lemma 2.6 (Ito Rule, multidimensional). Consider the n-dimensional stochas- 
tic process Xt : x i7 i-^ written 

m 

dXt = a{t, Xt)dt + ^ V{t, Xt)dW^^'^ = a{t, Xt)dt + 6(t, Xt)dWt (2.63) 
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where a{t, Xt), V{t, Xt) : x M" i-^ M" and each Wl is an independent Wiener pro- 
cess. If we collect these into the m-dimensional Wiener process Wt = {W^, . . . , W^^) 
and introduce b{t,Xt) : IR+ x M" t-^ x W^, we may use the more compact form 
on the right. 

Further consider the transformed process Yt = g(t,Xt) : IR+ x M" > W, where p 
is not necessarily equal to n. Then Yt satisfies the SDE 

* dt ^ dX] * 2^ t t y ! 

i ij t t 

where the superscript indicates the i,j,k-th entry in the vector and second order 
differentials are evaluated using dtdWl = dt"^ = and dW^dW^ = 6ijdt. 

For the simple case when p = 1, we may use the definition of Xt to conveniently 
write this as 

dYt = ^g{t, Xt)dt + V((7(t, Xt)fh{t, Xt)dWt (2.65a) 

=^ = I + V{g{t,Xt)fa{t^Xt) + \ Y^Y^V'it^XtW'it^Xtf^^^ 

^ i,i=i fc=i oXt6Xt 

(2.65b) 

The Ito rule is really no more than a Taylor expansion followed by a careful 
consideration of the /^^-convergence of the resulting terms. Not surprisingly, all 
terms which are a product of dt and any other differential tend to zero. However, 
one also finds that dWt converges to dt in which is effectively a restatement 
of the Ito Isometry in Lemma 2.3. At a heuristic level, many people often express 
dWt as \fdt^f, where ^t is a mean-zero, Gaussian random variable with unit variance. 
Then it is clear that any consistent chain rule which retains terms to first order in 
dt must also retain the term for dWt- 

The upside is that we have an integral which retains statistically pleasing prop- 
erties; mean-zero stochastic term driven by white noise which is also a martingale. 
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At the same time, we also have an algebraic formalism for transforming SDE repre- 
sentations of more complicated stochastic processes, at the small cost of having to 
add an extra term to the usual chain rule. 

2.2.4 Wong-Zakai Theorem and Stratonovich Integrals 

Even though we have made significant progress, one might still be concerned that 
the Ito formalism is simply a mathematical construct that has no connection to any 
real-world stochastic process. Should we really be so blithe in throwing away the 
usual chain rule? Given the arbitrariness of the Stieltjes stochastic integral, what 
was the justification for choosing the Ito construction? If the use of white noise is an 
approximation to begin with, how faithfully does the Ito SDE capture it? All of these 
questions are related and are well-appreciated in the study of stochastic processes. 

To make the issue more precise, consider the standard ordinary differential equa- 
tion driven by a fluctuating, but not white, noise term 



We assume is a sensible noise process whose sample paths are piecewise continuous. 
We are interested in the case that this approximates a true Gaussian white noise 
process in the sense that 



approximates the Wiener process. As the process becomes more and more singular, 
the question is how to interpret the resulting stochastic differential equation. The 
following theorem, due to Wong and Zakai [1965], tells us what to do. 

Theorem 2.4 (Wong-Zakai Theorem). Given the ordinary differential equation 
of the form 




(2.66) 




ra— >oo f 



where ]¥[" = J^Cds. That is, in some limit, the time integral of uniformly 




(2.68) 
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where converges uniformly to Gaussian white noise as n ^ oo, the solution 
converges as as n —>■ oo to 

dXt = a{t, Xt)dt + b{t, Xf) o dWt (2.69) 

where the stochastic term is interpreted in the Stratonovich sense. 

Definition 2.21. The Stratonovich integral 

[ XtodWt (2.70) 
Jo 

is defined as the unique hmit in >C^(P) of the simple integrals 

X- o dWt = hm J2 liK^. + K)iWu^. - ^u)- (2.71) 

The Stratonovich integral obeys the standard calculus chain rules, but has non-trivial 
expectation and is not a martingale. 

Gadzooks! Wong and Zakai tell us that any physical process, which naturally 
obeys the normal rules of calculus, results in a Stratonovich integral in a white noise 
limit. This is not a complete surprise, as the Stratonovich integral obeys the normal 
chain rule and taking a limit of processes which also obey the chain rule shouldn't 
break that property. But remember that the formulation of the Ito integral was a 
choice of how to overcome the lack of an unambiguous convergence of stochastic 
integrals. The Stratonovich form is just a different choice in defining a stochastic 
integral. For deterministic integrals, any choice of increments converges to the same 
Riemann integral, so we didn't have to worry about which formulation is used. For 
stochastic integrals, the Wong-Zakai theorem tells us how to interpret an SDK which 
arises from taking a physical limit; after that, we are free to choose which form to 
use. If the two forms are not related, then the Ito definition would be useless for 
studying physical systems driven by approximate white noise. Fortunately, it turns 
out the the two formulations are simply related. 
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Lemma 2.7. The solution of the multi-dimensional Ito SDE 



-t = a{t,Xt)dt + b{t,Xt)dWt 



(2.72) 



is also solution of a corresponding Stratonivich SDE, written 



-t = d{t,Xt)dt + b{t,Xt)odWt 



(2.73) 



with 



d^it,X,) = a^it,X,) --Y^h\t,Xt) 



k=l 



dlP{t,Xt) 
dX^ 



(2.74) 



where the superscripts denote the j-th or k-th entry in the corresponding vector. 

We see then that it is straightforward to convert between the two forms, only 
needing to account for the ltd drift term. This term accounts for the loss of the non- 
anticipative property for the Stratonovich Wiener increment. That is, the stochastic 
process multiplying the noise increment no longer occurs at an independent time 
interval, which effectively couples the noise at different times and is why we lose the 
nice statistical properties. Nonetheless, after using the Wong-Zakai theorem to derive 
a Stratonovich SDE from a physical model, we simply convert to the equivalent Ito 
form to make our calculations easier. This duality will prove useful in Chapter 4 
when we study the techniques of projection filtering, which require a valid chain rule 
consistent with differential manifolds and is one of the few circumstances when the 
Stratonovich form will be preferred. 

2.2.5 Summary 

The goal of the second part of this chapter was to introduce time into our theory 
of probability. This allowed us to consider stochastic processes, which are random 
variables that are a function of time. Our hope of writing a stochastic process 
driven by white noise was hampered at first, as we learned that white noise has 
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no sensible mathematical representation as a stochastic process. Fortunately, we 
were able to work with the integral of white noise in terms of the Wiener process, 
which in turn allowed us to define more general stochastic processes as Ito integrals 
against the Wiener process. This gave rise to stochastic differential equations, which 
are dynamical equations for the evolution of stochastic trajectories involving both 
deterministic and stochastic terms. Due to the subtleties of the Ito integral, we found 
that SDEs obey a modified chain rule which requires retaining terms to second order 
in Wiener increments. We also found that the physical limit of increasingly better 
approximations of white noise converges to a Stratonovich, rather than an Ito, SDE. 
Fortunately, we found that a given stochastic process has an equivalent representation 
in either form, so that the statistically superior properties of the Ito integral may be 
used in analysis. 



2.3 Classical Filtering Theory 

Using the techniques we have developed thus far, we are finally ready to tackle the 
filtering problem. We consider an n-dimensional, unobserved stochastic process Xt, 
governed by the SDE 

dXt = a{t, Xt)dt + h{t, Xt)dWt "system" (2.75) 

and a related m-dimensional observed stochastic process Fj, governed by the SDE 

dYt — c{t, Xt)dt + d{t)dVt "observations/measurements" (2.76) 

where dWti dVt are two independent Wiener processes of k and p dimensions, respec- 
tively. Note that we have already imposed a particular structure on the stochastic 
processes under consideration; they are driven by white noise and admit an SDE 
description^. Given the broad applicability of Gaussian white noise in physics and 

^Meaning a,b,c,d are bounded, d"^ exists and is bounded and Xt,Yt have a unique 
.Ft-adapted solution; some of these restrictions may be lifted with suitable care. Note that 
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related disciplines, limiting ourselves to this class of processes is not a significant 
restriction, especially given the analytic results we will be able to derive. 

Returning to the problem at hand, Eqs. (2.75) and (2.76) are known in control 
theory as the system-observations pair and formalize the structure of the inference 
problem. That is, the unobserved system Xt undergoes a stochastic time-evolution. 
We are interested in some property of the system, but only have access to the obser- 
vations Yt. Unfortunately, Yf is not jF^-^-measurable, since it involves the independent 
noise process dVt and we therefore do not know Xt after measuring Yf. Fortunately, 
Yt carries some information about the system, albeit of a set structure and corrupted 
by the extra noise. Using the techniques of inference we have developed, we can still 
construct an estimate of the system conditioned on the observations. 

Definition 2.22. Given a system-observations pair as above, the filtering problem is 
to calculate the least-squares best-estimate of the current state of the system given 
the observations record. Mathematically, we write this as 



where J-'t is the filtration generated by the observations process up to time t. 

Actually, there is a more general class of inference problems one could consider, 
written 



where one estimates some arbitrary function of the state at an arbitrary time. If 
s = t and f{X) = X, this is simply the filtering problem already discussed. For 
s = and f{X) = X, this is the smoothing problem, for which 7rt[Xo] is an estimate 
of the initial state. For s > t and f{X) = X, this is the predictor problem, for which 

we could easily extend the SDE formalism to include Poisson noise processes in addition 
to Gaussian noise processes. 



7rt[Xt]=E[Xt\J'^] 



(2.77) 




(2.78) 



Chapter 2. Classical Probability and Filtering 



42 



7rt[Xs>(] is an estimate of a future state. Choosing s to be an intermediary time or / 
to be a more complicated function correspond to other vahd inference problems. 

Nonetheless, the most relevant problem for our purposes is the filtering problem. 
The rest of this section is devoted to developing a recursive formula for Trt[f{Xt)], 
written in shorthand as 7rt[/], so that for each differential observation increment dYt, 
we can readily update the filtered estimate 



for some functions q and r which we will need to determine. We will take / to be a 
square-integrable real- valued function, so that to reconstruct the multi-dimensional 
Xt, we would need a set of estimates 7rt[/*], with functions P{Xt) — XI. Making 
/ one-dimensional will greatly simplify the notation without losing any essential 
details. 

Our general approach is the reference probability method, which we will also use 
to develop the quantum filter. The basic idea is rather simple; if Xt and Yt were 
independent, then the conditional expectation of Xf amounts to a simple averaging. 
If we can find a measure under which the two processes are independent, then it 
will be trivial to evaluate the conditional expectation under this measure. Of course, 
if Xt and Yt were actually independent, the filtering problem would be pointless 
since wc would never learn anything about the state from the observations. So 
we must also find a way to relate the calculation under the new measure back to 
the original calculation under the old measure. The first two parts of this section 
focus on developing these two relations, finding a measure under which the processes 
are independent and another for relating conditional expectations under different 
measures. 



dn[f{Xt)]=q{t,Xt)dt + rit,Xt)dYt 



(2.79) 
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2.3.1 Girsanov's Theorem 

In many areas of mathematics, a change of variables often simphfies a seemingly 
difficult problem. In the domain of probability, a similar approach is to change the 
underlying probability measure, which may simplify the statistics of a random vari- 
able. We have already considered such a change using the Radon-Nikodym theorem 
(Thm. 2.2). Being able to make such a transformation is particularly useful for 
stochastic processes driven by Gaussian white noise, whose deterministic terms ob- 
fuscate many of the nice statistical properties of a pure Ito integral over the Wiener 
process. The following theorem shows how to construct a new measure under which 
such a statistically complicated stochastic process becomes a Wiener process. 

Theorem 2.5 (Girsanov). Let Wt be an n- dimensional, J-'t-Wiener process on 
(r2,jF, P) with filtration J-'t. Also consider the n-dimensional stochastic process Xt 
governed by the SDE 

dXt = Ftdt + dWt te [0, Tf] (2.80) 

Assuming Ft is ltd integrable, define 

A = exp 



FjdWs-- WFsW'ds 
^ 



(2.81) 



//Ep[A] = 1, then Xt is an J-'t-Wiener process under Q{A) = Ep(Axy 



Proof. For simplicity, we will proof this result for a one-dimensional process. For a 
more general proof, see Theorem 4.5.3 in van Handel [2007] the first half of which 
is essentially reproduced here. Recall from Definition 2.18, a stochastic process is 
characterized by continuous sample paths and independent, Gaussian distributed 
increments with zero mean and variance equal to the interval length. Given that Xt 
is written as an SDE, it has continuous sample paths by construction. In order to 
show the increment properties, we consider a given interval Xt — X^ with t > s. If 
under the new measure Xt — Xg has the appropriate distribution independent of any 
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J-'s-measurable random variable, we satisfy both requirements. We verify this using 
the method of generating or characteristic functions. That is, for Xt as defined above 
and Z an arbitrary jF^-measurable random variable, we want 

EQ[e"(^-^^)+^^] = e-"'^EQ[e^^] (2.82) 

2 (i — s) 

where a, /? G M are the generating parameters and e~" is the characteristic 
function of a mean zero, variance t — s, Gaussian random variable. 

Using the definitions above and introducing the J?-i-adapted process 



r-t 1 rt 

At = exp 



- / F.dWs - I I Ftds 



2 Jo 



(2.83) 



we find explicitly that 

EQ[e"(^*-^^)+^^] = Ep[AT,e"(^'-^=)+'^^] (2.84) 

= Ep[Ep[ATjj;]e"(^*-^=)+^^] (2.85) 

= Ep[Ate"(^*-^^)+^^] (2.86) 

= Ep[A,e^^'(°^^-5^')<^^+^^'("-^^)'^^'-+^^] (2.87) 

= e-"'^Ep[A,e-^^^'("-^'-)''^^+^''*("-^'-)'=''^'-+'^^] (2.88) 



where in reaching the last line we have completed the square and pulled out one 
of the deterministic terms. The manipulations in the first three lines are simply 
an application of the definition of conditional expectation (Definition 2.12), where 
all terms save A^^ are jF^-measurable, so that we may replace A^^ with Af under 
the overall expectation. Similarly, since A^e^'^ is jFj,-measurable but the remaining 
exponential terms are not, we again apply conditional expectation to write 

EQ[e"(^*-^=)+^^] = e-"'^Ep[A,e'3%p[e-5/*("-^^)'*+/:("-^'-)'^'^^|^,]] (2.89) 



Focusing on the last conditional expectation term, set 9t = 
dRt = -]^e'ldt + OtdWt 



{a — Ft) and define 
(2.90) 
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If we can show that e^* is a martingale, then the conditional expectation under 
considertation is simply 



(2.91) 



Using Ito's rule, we find 



-hldt + dtdWt + ]^d^tdt 



Ote^'dWt 



(2.92) 
(2.93) 
(2.94) 



But since this is precisely an Ito integral driven by Gaussian white noise, we know 
from the Martinagle Representation Lemma 2.4 that it is indeed a martingale. Notice 
also that Aj is of the same form, since the minus sign on the dWt coefficient still 
squares to cancel the deterministic term via the Ito correction. As such, we can drop 
the conditional expectation as desired and use the martingale property of A^ to write 

EQ[e"(^*-^=)+^^] = e-"'^Ep[A,e^^] (2.95) 
= e-"'^Ep[Ep[AT,e^^|^,]] (2.96) 
= e-"^^EQ[e^^] (2.97) 

where in reaching the last step we have used the conditional expectation property 
that E[E[X|jF]] = E[X] to recognize the definition of Eq as desired. □ 



Girsanov's theorem allows us to find a measure under which stochastic processes 
like the observations process in Eq. (2.76) are Wiener processes. If we can find a 
measure such that Yt is independent of Xt and is equivalent to a Wiener process, 
we might then be able to evaluate the conditional expectation easily. The following 
section addresses that task. 



Chapter 2. Classical Probability and Filtering 



46 



2.3.2 Bayes Formula 

Although the Radon-Nikodym theorem (Thm. 2.2) relates expectations under related 
measures, we have yet to develop a method for relating conditional expectations un- 
der different probability measures. The following formula, reminiscent of the familiar 
Bayes rule for conditional probabilities, provides a means for doing so. 

Theorem 2.6 (Bayes formula). Let (^7,JF, P) be a probability space with another 
measure Q such that P ^ Q. Then for some Q G and random variable X such 
that Ep[|X|] < oo, the following Bayes formula relates conditional expectations as 
follows: 

Mm 

where ^ is the Radon-Nikodym derivative. 



Proof. Again, we follow the exposition of Lemma 7.1.3 in van Handel [2007]. Let 
S E Q. Since both sides satisfy the Kolmogorov definition of conditional probability, 
we can use the arbitrary ^-measurable random variable Is to show that both sides 
satisfy the conditional expectation property. Starting from the numerator on the 
right, we have 

dF dF 
EQ[IsMX—\g]] = Eq[IsX—] = Ep[JsX] = Ep[IsX] (2.99) 



where we have used the properties of conditional expectation and the Radon-Nikodym 
relation. Using the conditional expectation property again and running the above in 
reverse, we find 

dF dF 
Ep[IsX] = Er[IsEr[X\g]] = Eq[/5— Ep[X|6;]] = EQ[IsEQ[—\g]Ep[X\g]]. (2.100) 



But since this is true for for any S, it must hold without the outer expectations and 
Is, so that 

dF dF 

MX:j;^\g] = EQ[—\g]E4x\g] (2.101) 
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If we divide by Eq[^|^] we recover the Bayes formula. □ 

With this result and the Girsanov theorem, we are now ready to solve the filtering 
problem. 

2.3.3 Non-Linear Filtering Equations 

With the Girsanov theorem and Bayes formula in hand, we can now proceed to find 
a formula for iTt[f] = Kp[f{Xt)\J-'Y]. Our first step is to find a new measure Q under 
which Xf and J-"^ are independent. Since Xq is already independent of Wt,Vt, our 
task is really to show that dWt, dYt are two independent JF^^- Wiener processes under 
Q, where we have set 

dYt = d-\t)c{t,Xt)dt + Vt = d-\t)dYt (2.102) 

Noting that this is precisely the Girsanov form in Eq. (2.80), introduce 



At = exp 



- [\d~\s)c{s,X,)fdYt-l f\\d-\s)c{s,X,)\\'ds 
Jo Jo 



(2.103) 



so that the new measure Qr^. is defined by the density dF/dQxf = From the 
Girsanov theorem, we know that 1^ is a Wiener process independent of Wt and Xq, 
since for the Girsanov form in Eq. (2.80), the process is independent of the stochastic 
coefficient Ft under the new measure. Thus, under Q, Xt and Yt are independent 
and we use Bayes formula to rewrite the conditional expectation as 

E«,iA,i^n <T,(i) ^ ' 

where we have introduced the unnormalized estimate at in the obvious way. Eq. 
(2.104) is known as the Kallianpur-Striehel formula. 

We now focus an deriving an SDE for the unnormalized form. We begin by using 
the Ito rule to calculate 

dKt = Kt[d-\s)c{s,X,)YdYt (2.105) 
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and using the mult i- dimensional Ito rule in Eq. (2.65) 

df{Xt) = ^tf{Xt)dt + [Vf{Xt)fb{t,Xt)dWt. (2.106) 
From the Ito product rule in Lemma 2.5, we find 

/(Xi)A, = /(Xo) + [ K^J{X,)ds+ ! K[Vf{X,)Yh{s,X,)dWs 
Jo Jo 

+ [ f{Xs)As[d-\s)c{s,X,)fdY, (2.107) 
Jo 

where I have used the integral, rather than the SDE form and noted Aq = 1. In 
order to recover the crt{f) form, we need to calculate Eq[-|JFj^] of both sides of the 
above equation. Given that the integrals are essentially sums, the expectations may 
be brought inside and applied directly to the integrands. But by construction, dWg 
is independent of J-'^ under the measure Q; after all, that is why we picked Q. 
As such, the conditional part is dropped, leaving Kq[As[V f {Xs)]'^b{s, Xs)dWs] = 
0, since dWg is an standard Wiener process under Q. Additionally, by properties 
of conditional expectation, JF^^ J^J under the integral, since for the adapted 
processes under consideration, J-'^ provides no extra information for conditioning 
than what is already in J-'J. Lastly, since dYg is JFj-measurable under Q, it may also 
be pulled out of the conditional expectation. This leaves 

EQ[/(XOAi] =Eq[/(Xo)|J-,^]+ / EQ[A,ifJ(X,)|^l']d5 

+ / EQ[f{Xs)As[d-\s)cis,Xs)f\J'J]dZ, (2.108) 
Jo 

from which we identify the Zakai equation 

datif) = (rt{^tf)dt + at{d-\s)c{s,Xs)ffdZ. (2.109) 

In order to recover the SDE for the full filter, we note that 



datil) = atid-\s)cis,X,)fdZ 



(2.110) 
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and use the Ito rule to calculate 

da^if) at{f)dat{l) 1 dat{l)dat{f) at{f)dat{l)dat{f) 

(2.111) 

Plugging in for these terms, noting that at{f)/<Jt{l) = 'n'tlf] and rearranging the 
result leads one to the Kushner-Stratonovich equation given in the following theorem. 

Theorem 2.7 (Kushner-Stratonovich). The solution to the filtering problem sat- 
isfies the SDE 

dnlf] = 7rt[^tf]dt 

+ {nt[d{t)-'c{t,Xt)f] - nt[f]7Tt[d{t)-'c{t,Xt)]f {dYt - TTt[d{t)-'c{t, Xt)]) 

(2.112) 

wtth7ro{f)=Ep[f{Xo)]. 

This is precisely a recursive equation of the form we desired, in which the estimate 
of f{Xt) is updated in place with each measurement increment dYt — d{t)~^Yt. 

Before exploring the details of this equation, let us first reflect on the path we 
have taken in deriving it. For a seemingly simple form, what was really the point of 
changing measures and constructing the dYt process? As was stated as motivation, 
by constructing the measure Q under which Xt and were independent, the con- 
ditional expectation with respect to that measure becomes relatively trivial. Indeed, 
that is what we found in calculating the Zakai equation for crt{f)- Due to the nature 
of Q, wc were able to completely drop terms involving dWf. By the deflnition of 
conditional expectation, K[f [Xt)\J-'^] is precisely an orthogonal projection onto the 
space T^; since dWt is independent of , it is dropped in the orthogonal projec- 
tion. But a more important feature of working under the new measure was that the 
process dYt could be pulled out of the conditional expectation since it is manifestly 
^(^-measurable under Q. As a result, the integral over dYt is essentially just the 
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averaging we sought from the beginning and is the essential property that allows us 
to express the filter as a SDE over the process dYt. The rest of the work was merely 
applying Bayes formula to relate the Zakai equation for o"j(/) back to TTt[f]. 

It is worth recognizing the following important process in the Kushner-Straton- 
ovich equation. 

Definition 2.23. The innovations process, written 

Vt = Yt- [ rt,[d{s)-'c{s,Xs)]ds (2.113) 
Jo 

is an JF^^- Wiener process and satisfies the SDE 

dVt = id{ty'cit,Xt)]-nt[d{t)-'cit,Xt)])dt + dVt. (2.114) 

The proof that it is a Wiener process is essentially identical to the generating function 
approach used to proof Girsanov's theorem and is found in Proposition 7.2.9 in van 
Handel [2007]. Another approach is to show dVt is a martingale that satisfies the Ito 
product dV^^ = dt, which by Levy's theorem^ means it is a Wiener process. 

Structurally, the form of the innovations process gives considerable insight into 
its properties. If we were to know Xt, the innovations process would be identically 
the Wiener process dVt, which is the noise corrupting the measurement that serves 
no purpose save to make our lives more difficult. Looking at the SDE form for dVt, 
we also see that it contains dVt in addition to the difference of the estimate and true 
process value. But by definition, that piece satisfies 

Ep[{d{t)''cit,Xt)] - 7r4rf(t)~^c(t,Xt)])|J-f] = t>s (2.115) 

so that the difference must be orthogonal to T^^^ . This is what gives the inno- 
vations process its name, in that the difference (rf(t)-^c(t, X*)] - 7r4rf(t)"^c(t, X*)]) 

^Essentially Levy's theorem tells us that if a given process Mt and the related one Mf—t 
are martingales, then Mt is a Wiener process. See [Williams 1991] for more discussion. 

^'^It might seem weird that all the pieces used to construct dVt come from Yj, yet this 
difference term is nonetheless not .T^^^-measurable. But note that we don't have access to 
this piece by itself, we get Vt along with. The innovations process smartly pulls out the 
information coming solely from Xf, as best as it can in the presence of Vt- 
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contains only the "new" or "innovative" information that would cause us to update 
our estimate. In a more heuristic view, the innovations process tries to make the 
measurements look as much as possible like the corrupting process Vt, so that the 
filter averages that white noise away to zero. Anything that makes Vt look different 
than Vt is then useful information about the process of interest. The added benefit 
that Vt is still a Wiener process, thanks in part to the property in Eq. (2.115), means 
we can leverage all of the Ito properties we like when studying the filter. 

Of course, the lingering important question is whether one can use the filter in 
practice. Looking at Eq. (2.112), we see that calculating Htlf] requires calculation 
of terms such as 7Tt[^tf] and 7it[d{t)^^c(t,Xt)f]. Plugging those terms back into 
the Kushner-Stratonovich equation will undoubtedly require calculation of iterated 
forms such as nt[^^f] and beyond, until a closed set of equations is reached. In 
general, we would expect to need an infinite number of equations to close the loop 
for the real-valued process Xt. Another perspective, which will prove useful for the 
quantum filter, is to work with an adjoint form of the filter, in which we introduce a 
random density Pt{X) which satisfies 

nt[f] = E4f{Xt)\J^^] = j f{x)pt{x)dx. (2.116) 

Integrating the Kushner-Stratonovich equation by parts gives the nonlinear, stochas- 
tic partial integro-differential equation 

dpt{x) = ^;pt{x)dt + pt{x) [d-\t)icit,x) - 7rt[c{t,x)])f dVt (2.117) 

where 

^M^) = - E^ («^(*'^M^))+2 E Ea^^ (^^'(^'^)^^'(^'^M^)) (2-118) 

This form is generally not any more useful the the Kushner-Stratonovich equa- 
tion, but is a duality similar to the Schrodinger and Heisenberg pictures in quantum 
mechanics. A similar PDE can be developed for the Zakai equation {at{f)), which 
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is at least a linear equation that admits more straightforward numerical approxi- 
mations. Of course, there is one well-known continuous distribution which requires 
only a few characteristic parameters — the Gaussian distribution. In the following 
section, we consider systems whose conditional state is well-described by a Gaussian 
distribution and therefore admits a simple and tractable filter with wide applicability. 

2.3.4 Kalman-Bucy Filter 

Perhaps the simplest systems-observation pair we can consider is one governed by 
the pair of linear stochastic differential equations 



where Xt,Yt are n, m-dimensional, real- valued stochastic processes, Wt,Vt are in- 
dependent, A;,p- dimensional Wiener process and At, Bt,Ct, Dt arc real- valued, non- 
random matrices of dimension n x n, k x n, m x m and p x m respectively. In 
physics and engineering, many problems are well-described or well- approximated by 
a linear description and are often appealing due to their relative analytical simplicity. 
As we will find in the following theorem, the filter for these simple systems is also 
simple, making linear stochastic models attractive for practical filtering and control 
applications. 

Theorem 2.8 (Kalman-Bucy Filter). The solution to the linear stochastic filter- 
ing problem, written Trt[X] — Ep[Xt|J^^], with ttqIX] Gaussian distributed, satisfies 



with innovations process dVt — ^{dYt — CtT^t[X]dt) and deterministic covariance 



dXt = AtXtdt + BtdWt 



dYt^CtXtdt + DtdVt 



(2.119) 
(2.120) 



the SDE 



dixt[X] = At-KtWt + Pt{D;^CtfdVt 



(2.121) 




(2.122) 
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Proof by citation and vigorous handwaving. For an excellent and detailed derivation 
of the Kalman-Bucy filter, consult 0ksendal [2002, Chap. 6] or the original papers 
[Kalman 1960; Kalman and Bucy 1961]. Another approach is to simply use the linear 
forms of Xt and Yt in our results from the previous section, although there are tech- 
nical reasons we should hesitate, primarily that the change of measure Aj is generally 
not square-integrable. Nonetheless, such subtleties can be handled and we would end 
up with the right answer. The details of the procedure are not enlightening, so I 
only review the strategy, which is to consider the density form of the Zakai equation, 
analogous to (2.117) and written 

at{f)= [ f{x)qt{x)dx dqt{x)=^:qt{x)dt + qt{x){d{t)-^c{t,x)fdYt (2.123) 



Plugging in the definitions for the linear system, we have 



dqt{x) 



i,j=l i=l 



dt 



+ qt{x){D;'CtxfdVt 



(2.124) 



We would then want to check that a density of the form 
qt{x) = Ntexp (^-^{x - nt[X]fP,-\x - 7r,[X])^ , 



(2.125) 



where A''^ is a non-random normalization function, is a solution to Eq. (2.124). The 
check involves several applications of the Ito rules followed by a comparison of terms. 
The interested reader should feel free to check this for themself; the rest of us will 
have to take my word for it. □ 



Unlike the non-linear filter, which estimates some function of the state, 7rj[/(Xf)], 
the Kalman-Bucy filter estimates the potentially multi-dimensional state itself, tt^ [Xt] . 
The form in Eq. (2.121) has two important pieces. A deterministic term propagates 
the state according to the djTiamics induced by the linear map At. Since this is a 
non-random term for the true state dynamics, we should not be surprised that the 
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filter's estimate is simply the same dynamics applied to the estimated state. The 
second term, which is proportional to the innovations process dVt, is responsible 
for conditioning and depends on the deterministic covariance matrix Pt^^. Remark- 
ably, just from the structure of the linear system-observation pair, the appropriate 
weighting of the input signal is completely determined. In another sense, our un- 
certainty in the estimate, given by the entries in Pt, is also completely determined 
by the structure of the linear system-observation pair — nothing in the observation 
causes us to change our certainty in the estimate. This is a direct consequence of the 
Gaussianity of the stochastic processes and the linearity of the system. Due to the 
nice transformation properties of Gaussians, we may trace the effect of the noise and 
initial state uncertainty through the dynamics and therefore know precisely how our 
uncertainty in tt^X] changes, weighting any updates due to the innovations process 
by that uncertainty. Perhaps reassuringly, when the uncertainties in Pt are large, we 
weight dVt more heavily and when we are relatively sure of the estimate, the entries 
in Pt are smaller and we weight the innovations less. As an added practical benefit, 
the time evolution of the covariance matrix Pt needs to be solved only once, using 
methods in Appendix A, and the solution may be reused for each application of the 
filter. The Kalman-Bucy filter is therefore a very practical tool for estimating the 
state of an n-dimensional linear system, requiring stochastic integration of the n- 
dimensional estimate TTt[X] and standard integration of the distinct ^^"-^^^ elements 
in the symmetric covariance matrix Pt. 

Example 2.5 (Parameter estimation). As an example use of the Kalman filter, con- 
sider the task of estimating the forcing parameter of a particle undergoing Brownian 
motion. The general techniques used will serve as a useful basis for the research 
presented in Chapter 4. We begin by letting Xt represent the position of the particle 
and introduce the SDE 

dxt = idt + dWt, (2.126) 

^-"^The matrix Pt{D'[^Ct)^ which multiphes dVt is called the Kalman gain matrix by 
control theorists. 
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where ^ is the forcing term we need to estimate. Continuous measurements of the 
particle are given by the SDE 



dyt = xt + dVt. 



(2.127) 



While we could go through the effort to calculate E from first principles, 

a more clever approach is to leverage the fact that ^ is a linear parameter in the 
dynamics and is thus amenable to the Kalman filter approach. That is, we define the 
augmented system Xt = [xt,^]^, which gives rise to the linear systems-observations 
pair 



dXt = AXt + BdWt 
dYt = CXt + DdVt 



(2.128) 
(2.129) 



where 



A 




B 



The covariance matrix 



Axf Axti 



C 



1 



D 



(2.130) 



(2.131) 



10 r 



10 



Figure 2.1: Plot of uncertainty in ^ parameter for the Kalman parameter esti- 
mation in Example 2.5 with A^q = 10^. 
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admits an analytic solution using the techniques in Appendix A. Setting the initial 
^0 = ( A° 2 ) , we find 



/ 1 Agg \ 

cothi-^^ coth t-ACg+t coth tAC- 

\cotht-Ag2_,_j(,othtA52 l+tAg2_^^2 tg^nij^ y 



(2.132) 



and the A^^ entry is plotted in Figure 2.1 for A^q = 10^. Ideally, we would want to 
take A^Q — oo to refiect a complete uncertainty in ^. Doing so gives 

lim Pt = I tcotht-i I /2.133) 

\tcotht-l t-tanht / 

which does not reduce to Pq ioi t = 0. This is because the infinite uncertainty in ^ 
immediately washes out the certainty we had in xq, since at the first time step, we 
have no clue what ^ and dWt will do to the particle. As such, knowing the initial 
position of the particle provides essentially no help in estimating the future position 
and forcing parameter when we have complete initial uncertainty in the parameter. 

In order to test the filter, we use the numerical integration techniques in Appendix 
B to integrate the dynamics of Eq. 2.126 for a known value of C,, say ^ = 1 . Using 
this system, the measurement record for dYt is generated and fed into the filtering 
equation, which constructs the innovations process and provides an estimate of the 
parameter ^ and the state Xf. Figure 2.2 shows the performance of the filter for a 
single run with step-size At = 10^^ and initial parameter uncertainty A^q = 10^. 
The top plot shows the noisy measurement process dYt, which is the only signal one 
gets experimentally. The middle plot shows the true state Xt and filtered state nt[x]. 
We see that after large initial fluctuations, the filter does a good job of latching on to 
the true particle position. Similarly, the bottom plot shows large initial fluctuations 
in the estimate ntl^], as the fllter has difficulty distinguishing forcing changes in the 
position due to ^ versus changes due to the noise term Wt- However, after this initial 
period, the Kalman fllter quickly latches on to the true value ,^ = 1 as was suggested 
by the deterministic uncertainty plotted in Fig. 2.1. 
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Figure 2.2: Plot of Kalman filter performance for parameter estimation in 
Example 2.5. Top plot shows observations record Yt. Middle plot shows true state 
Xt in blue and estimated state 7it[x] in red. Bottom plot shows true parameter 
value = 1 in blue and estimate value TTt[C] in red. 



2.4 Summary 



Given such a whirlwind of a chapter, what are the take away points? In a broad 
sense, I hope the exhausted reader is now convinced that analysis of continuous-time 
stochastic processes requires the use of rigorous mathematics, including axiomatic 
probability theory, measure theory and stochastic calculus. But more importantly, I 
hope the reader is further convinced that one need not be an expert in these tech- 
niques to appreciate their necessity and to use the resulting formalism gained by 
such prudence. Indeed, Example 2.5 was meant to show how easy it is to apply these 
techniques to solve a "real world" inference problem. Similarly, all the rigamarole 
that went into defining Gaussian white noise relative to the Wiener process and con- 
structing stochastic processes in terms of the Ito integral can be safely placed on 
the shelf; mindless applications of the Ito rule and straightforward composition of 
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stochastic differential equations are all we need to apply our techniques in practice. 
I also hope the reader appreciates the power one gains by developing a clear mathe- 
matical framework, particularly with regard to filtering and, although not mentioned 
here, the filter's use for optimal control of stochastic systems [Lipster and Shiryayev 
1977; Zhou et al. 1996]. 
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Chapter 3 



Quantum Probability and Filtering 



Most modern formulations of quantum mechanics present the theory in terms of the 
following postulates, here adapted from [Nielsen and Chuang 2000]. 

• The state of a pure quantum system is completely described by a normalized 
vector in a complex Hilbert space Ti. A statistical ensemble of pure states 

with probabilities pj, is called a mixed state and is written as the density 
matrix p = pj \ ipj ) (^/'j | . 

• The time evolution of a quantum system is described by a unitary operator Ut 
and acts as l^pt) = f^tlV'o) for pure states and pt = UtPoU} for mixed states. 

• Physical observations are described by self-adjoint, linear operators on Ti with 
eigenvalues Aj and eigenprojectors Pj. The probability of measuring outcome 
Xj is given by the Born rule — {tplPjlip) for pure states and Tr[pPj] for mixed 



• Given a particular measurement outcome j, the conditioned state is determined 
via the projection postulate, 



states. 




for pure states. 



(3.1) 



P 



P,PP, 
Tr [pP,] 



for mixed states. 
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• The state of a composite quantum system is described by the tensor product of 
the constituent systems, \ip^^^) ® - ■ ■ for pure states and p^^^ ® p*-^-* ® ■ ■ • 
for mixed states. 

Nascent in these postulates are rudimentary features of probabihty theory. Measure- 
ment outcomes are described by probabihties, which are assigned via the quantum 
state, much as the probabihty measure P assigns probabihties to elements in the 
(T-algebra, or by extension, to the potential values of random variables. Similarly, 
the conditioning provided by the projection postulate is analogous to conditional 
expectation in probability theory. As we turn towards solving the quantum filtering 
problem, in which we perform inference on the state of a quantum system conditioned 
on continuous measurements of that system, it would be natural to leverage the tech- 
niques we developed in solving the classical filtering problem. But the exposition in 
the last chapter should have convinced you that care must be taken in developing a 
mathematically well-posed probability theory, filtering problem and solution. 

As such, the first section of this chapter reviews quantum probability theory, 
stressing its differences with the classical theory developed in Chapter 2. This will 
make the inchoate features noted above more precise and allows us to interpret the 
projection postulate as a consequence of conditional expectation rather than as a 
postulate. In so doing, we will also find how the distinctly quantum possibility of 
non-commuting observables limits our ability to condition, which in turn will help 
formulate the quantum filtering problem. The second section focuses on quantum 
stochastic processes, particularly the quantum analog of the Wiener process which we 




Figure 3.1: Schematic of continuous measurement in quantum optics, in which 
light scattered by a cloud of atoms is continuously measured by a photodetector. 
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will relate to quadratures of the quantized electric field when in a vacuum or coherent 
state. With those tools in hand, we will then solve the quantum filtering problem of 
quantum optics, depicted in Fig. 3.1, where an optical field is scattered by a cloud of 
atoms. Continuous measurements on the light correspond to an observations process 
which may be filtered to learn about the atomic system. The exposition in this 
chapter closely follows [Bouten et al. 2007a], with added perspective from Accardi 
et al. [2002]; Barchielli [2003]; Geremia [2008]; Kiimmerer and Maassen [1998]; van 
Handel et al. [2005]. 



A word on notation 1 will be cavalier about placing "hats" on operators in this 
section, as context tends to make that clear and 1 find O more visually pleasing than 
O. On occasions where confusion may ensue, I will use them. 



3.1 Quantum Probability Theory 



Quantum probability theory is the non-commutative generalization of Kolmogorov's 
axiomatic probability theory. Just as in the classical case, subsuming discrete and 
continuous theories within a general measure-theoretic framework will provide an ab- 
straction capable of carefully dealing with the filtering problem. But unlike the case 
of classical probability theory, we do not start with an obvious "intuitive" theory 
of discrete quantum probability. Consequently, we begin this section by studying 
finite-dimensional quantum systems, where we can focus on the essential ingredi- 
ents of quantum probability. After that, we can extend our definitions to infinite- 
dimensional systems by dealing with the subtleties of functional analysis and measure 
theory. 
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3.1.1 Quantum Probability for Discrete Systems 

Let us fix 7^ = C", an n-dimensional, complex vector space. Observables in this 
space are self-adjoint linear operators A = , which may be represented as n x n 
complex matrices. From the spectral theorem, we know that a given observable A 
can be diagonalized as 

A = ^a,P,^, (3.2) 

i 

where Oj G M satisfies the eigenvalue relation 

A\ai) = ai\ai) (3.3) 

for the eigenvector |aj) and associated projector P^- = |aj)(aj|. From the postulates of 
quantum mechanics, we know that the probability of observing a particular outcome 
ttj when in the state p is Tr [P^^p]. Clearly, A is a lot like a random variable, in 
that it relates a particular value to a particular event, Pj. Indeed, the spectral 
decomposition is essentially identical to the decomposition of random variables in 
terms of indicator functions we considered in Eq. 2.17. We therefore see that the 
set of projectors {Pa^} is much like J-"^ , the set of events generated by some random 
variable X. Similarly, the linear map P(Pa.) = Tr [PaiP] is the measure or state which 
assigns probabilities to those events. It is important to note that this relation is 
clearest in the Heisenberg picture, where the state remains fixed and the observables 
change in time. This is in analogy to stochastic processes, which change in time 
relative to a fixed probability measure. 

Things get a bit more complicated if we want to describe joint probabilities for two 
different events. Classically, we simply have sets Pi,P2 £ so the joint probability 
for the two events is P(Pi fl F2) = E,[xfiXf2]- In quantum mechanics, we consider 
projectors Pa^,Pb- for two different observables A,B. We then hope that the joint 
probability of observing outcome and bi is P[Pa^Ph^] = Tr [Pa^P^^p]. Yet, A and 
B will not commute in general, so that the joint probability calculation depends on 
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the order of the projectors involved. But this is entirely contrary to what we mean 
by a joint probability, which is equivalent to the yes/no question "Did outcome 
Qi and outcome bi occur?". Surely this must be the same as the question "Did 
outcome and outcome Oj occur?" . However, we simply cannot pose this question 
unambiguously in quantum mechanics. This is no surprise really, as in a given 
experiment, we cannot ascribe underlying values to all observables consistently; i.e. 
there is no (local) hidden variable model for the system. More concretely, if given 
a quantum spin, there is no sensible way to describe the event that the x and y 
projections take on specific values simultaneously^. 

The "incompatibility" of non-commuting quantum events is really the only de- 
parture from classical probability theory. In essence, it states that for a single experi- 
mental realization, we may only speak sensibly about a set of commuting observables 
or events; all other non-commuting events are incompatible with the experiment un- 
der consideration and it makes no sense to discuss their probabilities. Thus, our first 
step in constructing a quantum probability space is to fix our a set of commuting 
observables in a mathematically well-defined structure. 

Definition 3.1. A *-algebra ^ is a set of operators closed under arbitrary complex- 
linear combinations, products and adjoints of its members and contains the identity 
operator. A commutative *-algebra is a *-algebra whose elements all commute. 

As was the case classically, it will often be useful to consider generating such a set 
from a particular observable A. 

Definition 3.2. Given an operator A, the sei = {X : X = f{A),f : R i-^ C} is 
the smallest commutative *-algebra generated by A. 

^Note that we are talking about projective measurements on a single system, not gener- 
alized measurements which might allow for imprecise, but simultaneous, measurements of 
non-commuting observables. Such measurements will fit within the quantum probability 
formalism by explicitly accounting for the auxiliary systems needed to perform them. 
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The generated *-algebra captures the structure of compatible observations, in 
that given the spectral decomposition of the observable of interest, A, we may directly 
calculate any observable f{A) G ^ as 



Thus, if we measure outcome we immediately know the outcome for any com- 
patible observation, specifically /(cii), up to any degeneracies in the eigenspectrum. 
It is therefore the eigenspace, represented by the label i, which truly characterizes 
compatible observables, where the actual value Oj is just there to give us the correct 
units. As we will soon see, this is enough to develop most of a corresponding classical 
probability space. The only remaining ingredient is to formalize the measure for the 
space, as given in the following defintion. 

Definition 3.3. A state on a *-algebra is the linear map P : ^ C which is 
positive, A> P{A) > and normalized P(/) = 1. Note that one can always 
write this as P{A) = Tr[Ap] for some density matrix p. 

We now have all the ingredients necessary to map a given commutative *-algebra 
and state into a corresponding classical probability space. 

Theorem 3.1 (Spectral Theorem, Finite Dimensions, (Adapted from The- 
orem 2.4 in [Bouten et al. 2007a])). Let ^ be a commutative *-algebra on a 
finite- dimensional Hilbert space and let P be a state on . Then there exists a prob- 
ability space P) and a linear, bijective map l from elements of ^ to measurable 
functions on Q such that l{AB) = l{A)l{B) and l{A'') = l{A)* and the probability 
measure is determined by P{A) = E,p[l{A)]. 




(3.4) 



Proof. We will simply construct the probability space by hand, taking care to formal- 
ize the intuitive relations between projectors and events discussed above. To begin, 
given that ^ is commutative, we may simultaneously diagonalize each n x n matrix 
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v4 G for convenience, suppose that each A is aheady diagonal with entries An. 
Then set i7 = {1, . . . , n}, so that u ^ Q serve as labels for the different eigenspaces. 
Define the map l{A) : Q C hj L{A){i) = An. Thus the map l takes operators in 
^ to random variables on the dummy sample space Q. Each random variable ^(^4) 
just takes on the appropriate eigenvalue of A when given the eigenspace label u & Q. 
We then generate the a-algebra as JF = {l{A) : A E ^} and define the probability 



Thus, a commutative *-algebra and quantum state are equivalent to a classical 
probability space. Once restricted to a commuting set of observables, there is noth- 
ing particularly quantum left to worry about. Of course, we will want to consider 
a variety of experimental realizations, in which on each trial we might study differ- 
ent observables which do not commute. This generalization suggests the following 
definition of a finite-dimensional quantum probability space. 

Definition 3.4. A finite- dimensional quantum probability space is the pair (^,P), 
where ^ is a *-algebra of operators on a finite-dimensional Hilbert space and P is 
a state on ^ . 

Note that unlike a classical probability space, there is no sample space in the quantum 
setting; the corresponding classical space simply inherits an Q passively through the 
eigenspace labels. For the n-dimensional space 7i, we tend to take ^ to be the set of 
all bounded operators on that space, written ^(7i). For a given experimental setup, 
one selects the commutative sub-*-algebra ^ C ^ relevant for the observations 
we intend to make. Using Theorem 3.1, one can then construct the corresponding 
classical probabihty space and calculate a variety of statistics using techniques from 
the previous chapter. 

Example 3.1 (Example 2.6 in [Bouten et al. 2007a]). As a concrete example, con- 
sider a single spin- 1/2 particle or qubit, which has Hilbert space H = C^. The 




□ 
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*-algebra of operators may be expanded as 

^ = {aol + OLxUx + 0L2Oy + OL-iOz : CKj G C} (3.5) 

where the PauU matrices are given by 



= . (3.6) 





To round out the quantum probabihty space, we consider the pure qubit state point- 
ing up along x, written l+x) = '^^X) standard basis, so that the quantum 
probabihty state is P(A) = (+a;|A|+a;). This completes the quantum probability 
space {jV ^ P). 

In order to apply the spectral theorem, we select the commutative sub-algebra 
generated by the observable cr^. Admittedly, there aren't really many other in- 
teresting observables in this commutative algebra, but we can still work through the 
quantum probability formalism. Since is already diagonal as written, we read off 
the two-eigenvalues ±2; = ±1 and projectors 



P^z^\ P-.= \ (3.7) 





Applying the spectral theorem, we introduce Q = {1, 2} and T — {0, {!}, {2}, Q}. 
Since observables in are of the form aP^^ + ^P-z for a, /3 e C, we simply need to 
know how i acts on the projectors. This is simply t(P+) = ^-iid i{P-^ = X{2}- We 
then see that, for example, P({1}) = P{i~^{x{i}) = {+x\P+z\+x) = 1/2 as expected. 

The quantum probability formalism will also allow us to calculate conditional 
expectations, in which we determine the expected value of a future measurement 
outcome given a current measurement outcome. Clearly such an expectation only 
makes sense when the two measurements are compatible, otherwise there would never 
be an experiment in which we could even in theory attempt to assign observed val- 
ues to each measurement simultaneously. Yet, this may appear troubling at first. 
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For example, consider a spin-1/2 particle, on which we seek to condition a ay mea- 
surement given a az measurement. Although these obscrvables do not commute, it 
appears completely sensible to calculate a future expected ay measurement given a 
a^ outcome. Indeed, we know it to be precisely zero, since the quantum state is in 
one of the two az eigenstates after the az measurement and both eigenstates have 
zero ay expectation. We clearly have a consistent way to describe observed values for 
these non-commuting observables, so how do we reconcile this with the limitations 
imposed by quantum probability theory? 

It is actually straightforward if we carefully consider what conditional expectation 
means in this context. Classically for two events A, B, the conditional probabihty 
of B given A is the probability that B is true given that A is also true in the 
same realization. For the spin under consideration, a naive statement of conditional 
expectation corresponds to the current expected y-projection value of the spin given 
that it also currently has a particular 2;-projection value. We know that this is not 
sensible from fundamental quantum uncertainty, as the spin cannot have perfectly 
defined and ay values at the same time. However, it is more likely that we 
meant to consider the conditional expectation which corresponds to the expected ay 
measured value conditioned on a previous cr^ measurement. But this means that the 
expected az value is actually written down somewhere and in order to sensibly talk 
about performing both measurements, we really need to include this other physical 
system which was used to measure the spin indirectly. This corresponds to including 
a physical model of the measurement apparatus or probe system used to perform 
the indirect az measurement in our quantum probability model. After all, in an 
experiment there is some physical process by which we learn the direction of the 
spin, perhaps by coupling the position of the particle to its spin state via a Stern- 
Gerlach device, after which the position tells us about the spin state. By including 
such extra quantum degrees of freedom explicitly, we can then pose the measurement 
of az as an indirect measurement on an auxiliary space, which will then commute 
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with direct ay measurements on the spin^ . 

Continuing along then, we see that conditional expectation can be posed sen- 
sibly if we include the measurement model within the quantum probability space. 
We therefore define the conditional expectation by first selecting the commutative 
sub-algebra C ^ which represents the measurement we will condition upon. 
Then there is some other set s^' = {B e : AB = BA W A e js/} called 
the commutant which represents the set of observables which can be simultane- 
ously diagonalized with any A E £^ . For some B G the conditional ex- 
pectation is then inherited from the corresponding classical probability space as 
P(i?|=2/) = {K¥{L{B)\a . It is important to note that elements in s^' 
need not commutate with each other, just as they need not be in ^ directly. Physi- 
cally, the elements in are the commutative set of observables on the probe system 
and elements in are the observables on the initial quantum system, which trivially 
commute with members in but not necessarily each other. The example at the 
end of this section should help clarify these different *-algebras. 

Although this is enough to perform calculations, one would hope that the ab- 
stract mapping between quantum and classical in Theorem 3.1 would allow us to 
calculate the conditional expectation without explicitly working through the t map- 
ping. This turns out to be possible, especially in light of the least-squares projection 
interpretation of conditional expectation. The finite-dimensional *-algebra is actu- 

^Perhaps this seems hke only sidestepping the issue, as one can always question why 
one measurement is considered direct whereas the other is considered indirect. Moreover, 
how do we measure the position of the spin after it goes through the Stern-Gerlach device? 
Isn't that just another measurement that also requires a physical measurement model? I 
agree that the so-called Heisenberg chain of measurements is unsettling, but the issues are 
more philosophical than practical. At some point, perhaps all the way to the neurons in our 
brain, we will assume that a projective measurement happens. For the sake of being able 
to consider conditional expectation and inference within the quantum probability setting, 
it will be sufficient to consider projective measurements only one level away, on the probe 
system, which could include the entire universe save the primary quantum system if that 
is more comforting. 
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ally a finite dimensional linear linear space with the Hilbert-Schmidt inner product^ 
{A, B) = P^A'^B). The conditional expectation is then precisely the orthogonal pro- 
jection from £/' onto the linear subspace s^. We can expand this projection easily 
in terms of an orthogonal basis for which from the spectral theorem is simply the 
set of eigenprojectors of We then have 

which looks exactly like our explicit formula for discrete conditional expectations in 
Eq. (2.21). Similar to what we saw in that equation, the conditional expectation 
is an operator on and we see that the weighting factors in that basis, given by 
P{Pa^B)/P{PaJ, are the expected values of B restricted to that eigenspace. Note 
that if 5 ^ the inner product would depend on the order of its arguments and 
would in general give a complex coefficient in the sum even if B were an observable. 

Before attempting to extend these definitions to infinite-dimensional spaces, we 
close this section with a physical example which will hopefully clarify the above 
definitions. 

Example 3.2 (Based on Example 2.9 in [Bouten et al. 2007a]). We work with 
the qubit system introduced in Example 3.1, but here consider conditioning a o"z 
measurement on an initial ax measurement. As we just found in developing the 
conditional expectation, since [o"^,^^.] 7^ 0, we need to introduce an auxiliary probe 
system in order to discuss conditioning the measurement. As such, we introduce 
another qubit system, with quantum probability space (Aip,Pp), so that the joint 
space is {Afs ® Afp, Pg ® Pp), where the subscripts stand for system and probe. Our 
measurement procedure should work for any system state (afterall, the point of 
measuring is to learn something we don't know), so it is described by the arbitrary 
density matrix ps- Conversely, the probe must start in a known fiducial state, here 

Again, it is actually not quite enough to be a norm, as = {A, A) may be zero 

even if A is not the zero operator. 
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\+z), so that any changes in its state reflect information about the system, thus 
Pp(A) = Tr[/lP+,]. 

Now suppose we are only capable of performing measurements. Therefore, 
in order to perform the indirect ax system measurement using the probe qubit, we 
must find a unitary U such that measuring W{I ® o"^)f/ gives the same statistics 
as measuring a^® I would on the system prior to the interaction. Also note that 
the future direct cr^ measurement on the spin will then commute with this indirect 
measurement, i.e. [f/"'^(/ ® 0^)11, U'^{az ® I)U] = 0, so that U'^{az ® I)U is in the 
commutant of f/^(/ (S> cr^)f/ and the conditional expectation is well-defined. 

Following a general procedure in Example 2.9 in [Bouten et al. 2007a], we con- 
struct the unitary 

(3.9) 




|±x)(±x| = - ax = \+z){-z\ + \-z){+z\. (3.10) 



We now verify explicitly that measuring ±z on the probe qubit occurs with the 
same probabilities as measuring ±x on the initial system qubit. The probability of 
measuring +z is given by 

P, ® V,p(U\l ® P+,)U) = P, ® Pp(P+. ® P+. + P-x ® P-z) (3.11) 

= P,(P+,)Pp(P+,) + P.(P-.) Pp(P-.) (3.12) 

V ' 

=0 

= P.(P+x) (3.13) 

where the particular initial probe state \+z) implies Pp(P_2) = Tr [P-^P+^j = 0. 
Similarly, the probability for measuring —z is given by 

P, ® Pp{U\l ® P-,)U) = P, ® Pp(P+, ® P_, + P-x ® P+z) (3.14) 

= P,(P_,) (3.15) 
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so that the probabihties correspond as desired. 

Given U, we may now consider the conditional expectation. We set ^ as the 
commutative *-algebra generated by the probe measurement W{I ® <Jz)U so that 
W{az eg) I)U G as desired. From Eq. 3.8, we find 

Ps ®Pp{U^{(rz ® I)U\£^) (3.16) 



a=±z 



Without a loss of generality, lets consider one of the conditional probability terms 
in this sum, say for a = +z, the O Pp{W{a, (g) P+^)U)/Ps ® Pp{U^{I ® P+z)U) 
factor. We know from Eq. (3.13) that the denominator is simply the probability for 
the system qubit to be measured in +a;,i.e. Ps{P+x)- Focusing on the numerator, we 
find 

Ps0Pp{U^{az®P+z)U) = P,®Pp(P+^cr,P+^.®P+,)+P,®Pp(P_^cr,P+^®cr^P+,) 
+ P, ® Pp(P+^a^P_^ ® P+,(T^) + P, ® Pp{P_xazP-x ® P-z) (3.19) 

But since Pp(P_2) = Pp{(7xP+z) = 'Pp{P+zO'x) = and Pp(P+2) = 1, only the first 
term survives. A similar calculation holds for the a = —z term in the sum, so that 
the conditional expectation is 

P, ® Pp(f/t(a, ® /)f/K) = ^^^t^^f/^J ® P+z)U 

P iPaP ) (^-20) 

^ p7p1x) ^'^^^^-^^- 

Recalling that Ps(^) = Tr[Aps], we introduce the conditioned density matrices 
p±x = P±xPsP±x/ T^^[P±xPs], so that we may further simplify our expression to 

P,®Pp([/t(a,®/)f/K) = TT[p^^az]U\l0P+z)U+TT[p_,az]U^{I®P-z)U (3.21) 
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We see that the conditional expectation is a diagonal observable in A, where the 
eigenvalues associated with each outcome of the probe measurement are precisely 
the conditional probabilities one finds using the Born rule! That is, once the probe 
measurement determines whether outcome W{I (8) P+z)U or U^{I (8) P-z)U occurs, 
this conditional observable immediately reduces to the corresponding expected value 
of for the conditioned qubit system state. What is perhaps remarkable, is that the 
Born rule is then a consequence of conditional expectation, which is not a axiomatic 
definition, but a derived one following the Radon-Nikodym approach and using the 
least-squares criterion. This is in contrast to the quantum case, where the Born rule 
is assumed axiomatically. 



3.1.2 Quantum Probability Spaces 

The task of developing a general quantum probability theory which describes both 
finite and infinite dimensional spaces is fraught with the same difficulties we faced in 
developing a general classical probability theory, but now the infinities can confound 
us in two ways — issues related to simply describing infinite dimensional quantum 
systems and issues related to describing infinite dimensional probability spaces. For 
the former case, this means the relatively straightforward linear algebraic tools in the 
previous section must be promoted to more sophisticated functional analysis tools. 
For the latter, we again will use methods of measure theory. 

We begin by considering a complex Hilbert space Ti, which may be finite or infinite 
dimensional. We further consider ^(7Y), the set of bounded, linear operators on Ti. 
By restricting consideration to bounded operators for the time being, we can avoid 
some details which are better handled after introducing the quantum probability 
space. As is familiar for quantum systems, the Hilbert space adjoint of an operator 
A e m(n) is written At and is defined by (V'K^I^)) = for all |0) e H. 

Given that ^{H) is already a Hilbert space (a complex vector space with norm given 
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by the trace inner product) with operator multiphcation, it is an algebra. Adding in 
the adjoint operation via f makes ^{Ti.) a *-algebra by Definition 3.1. 

One would hope that a *-algebra defines a suitable set of operators for a quantum 
probability space, but this is not true for infinite-dimensional systems. In particu- 
lar, we are faced with issues of convergence of a sequence of such operators, which 
is important for defining quantum probability operations as a limit of sequences of 
simple operators. The problem is that there are multiple types of convergence which 
induce different topologies on ^(H). Consider a sequence of operators {T^} on H. 
By stating that T„ converges to T, we could mean that ||T„ — T|| 0, where the 
norm is induced via the Hilbert-Schmidt, trace inner-product norm on the ^{Ti) 
Hilbert space. We could instead mean that Tn\ip) ^ T\iIj) for any lip) G 7^ or that 
/i(T„|^/')) \—>- fi{T\tlj)) for all linear functions f : Ti. ^ C A plethora of different 
topologies defined relative to different convergences exists for sequences^ in ^(H). 
The following definition classifies the particular topology useful for defining a quan- 
tum probabihty space. 

Definition 3.5. Consider a positive linear functional g : ^{Ti) ^ C. It is called 
normal if g{supa^a) = s'^Padi^a) for any upper bounded increasing net (Aa) of 
positive Aa G ^(7i). The normal topology on ^{Ti.) is defined by the family of 
seminorms {A i— * 1(7(^)1 : g normal }. 

Given this topology, we may define the algebra suitable for quantum probability 
spaces. 

Definition 3.6. A von Neumann algebra^ J/ is a *-subalgebra of SSi^K) which is 

^For topological spaces, we really consider the generalization of sequences called "nets", 
which is a function from a directed set to the topological space. Sequences are essentially 
nets where the directed set is the natural numbers. Generalizing to nets allows one to 
consider convergence in topological spaces which are are not "first-countable", lacking a 
countable neighborhood basis for elements in the space. I'm already way out of my league 
on this one, so I defer to textbooks on topology for the real details. 

^There are other equivalent ways to define a von Neumann algebra, often in terms of 
the weak and strong operator topologies, see [Redei and Summers 2007]. 
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closed in the normal topology. A state P on ^ is the restriction of a normal state 
on ^{n) to J^. 

Of course, it might be tedious to study the topology of some group of operators 
whenever we are interested in defining a von Neumann algebra. Fortunately, the 
following theorem will enable us to generate a von Neumann algebra from a relevant 
set of operators. 

Theorem 3.2 (Double Commutant Theorem (Theorem 3.8 in [Bouten et al. 
2007a])). Let y C ^(H) be a self-adjomt set ( if S e y then e y). Then 
= y" is the smallest von Neumann sub-algebra in ^{Ti.) which contains y . 

Therefore in order to generate a von Neumann algebra, we look at the set of 
operators which commute with what commutes with the operators we started with, 
i.e. for y C ^(7Y) the generate von Neumann algebra is {y U y'')". 

Definition 3.7. ^ = vN [Ai, . . . , An) is the smallest von Neumann algebra gener- 
ated by the observables Ai, . . . , An. 

With these definitions, one can now define a spectral theorem appropriate for 
infinite dimensional systems. 

Theorem 3.3 (Spectral Theorem (Theorem 3.3 in [Bouten et al. 2007a])). 

Let ^ be a commutative von Neumann algebra. Then there exists a measure space 
{Q,J^,fi) and a *-isomorphism l (up to fi-a.s) which maps from ^ to L°° {Q , , fi) , 
the algebra of bounded functions on the measure space. A probability measure F, 
absolutely continuous with respect to /i, is defined via the normal state P on ^ as 
C = Ep[i(C)] for all C G "^.^ 

^The reason for using rather than P is that there will be P G such that F{P) = 0, 
which renders i not invertible on those null sets. That is also why the ultimate probability 
measure F is absolutely continuous with respect to ji. 



Chapter 3. Quantum Probability and Filtering 



75 



The technical reasons for moving to von Neumann algebras and the normal topol- 
ogy are not particularly enlightening for us. In fact, throughout the rest of this thesis, 
we will rarely worry about the distinction between *-algebras and von Neumann al- 
gebras. Nonetheless, there are reasons why these choices were made and I encourage 
the interested reader to consult Section 3.1 in [Bouten et al. 2007a] for more discus- 
sion. The basic idea for choosing a von Neumann algebra is similar to the reason 
why one cannot generally use the power set of Vt in defining the cr-algebra for a 
classical probability space — it is "too big". By restricting to the normal topology, 
we guarantee that the von Neumann algebra is generated by its projections. Simi- 
larly, the restriction to normal states ensures monotone convergence of a sequence 
of observables which is related to the countable additivity requirement we have for 
classical probability measures. 

Definition 3.8. A quantum probability space is the pair (^,P) where ^ is a von- 
Neumann algebra and P is a normal state on 

This is essentially identical to Definition 3.4, only with *-algebras generalized to 
von Neumann algebras and states generalized to normal states. As such, we would 
use it in the same way, selecting a commutative von Neumann subalgebra ^ C ^ 
which corresponds to the observables we plan to measure in a given experimental 
realization. The statistics for those observables may then be calculated using the 
spectral theorem (Thm. 3.3). The essential point is that a commutative quantum 
probability space is identical to a classical probability space. 

3.1.3 Quantum Random Variables 

Recall that in the discrete setting, quantum random variables were simply self-adjoint 
operators, whose spectral decomposition in terms of projectors was analogous to 
the decomposition of discrete classical random variables in terms of indicator func- 
tions of events. Generalizing this decomposition to the continuous setting proceeds 
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analogously. We consider the quantum probability space (^,P) and select a par- 
ticular self-adjoint A G ^ which generates the commutative von Neumann algebra 
^ = vN {A) C c/T. From the spectral theorem (Thm. 3.3), we know that there exists 
a classical probability space P) and isomorphism l that maps A to a random 

variable on Q which we write as a : i7 i-^ M. Since this is a continuous, real-valued 
random variable, we know that we can use the Borel algebra B to decompose a into 
its events. That is for some Borel set B ^ B, the event a & B corresponds to the set 
{w & Q : a{uj) E B = a^^{B) G JF}. To map this back to the quantum space, we 
invert l. The projector that corresponds to this event — "A takes on a value in 5" — is 
then written Pa{B) = L^^iXaeB)- The map P4 is known as the spectral measure in 
functional analysis and allows us to decompose A as 



This is exactly the generalization of the finite-dimensional spectral decomposition 
in Eq. (3.4), where A plays the role of the eigenvalue and PA^dX) plays the role of 
eigenprojectors. Again, we have the interpretation that any f{A) can be trivially 
evaluated using this decomposition once we know which event, or equivalently which 
eigenspace, occurred. 

Aside from the functional analysis machinery, bounded observables in the general 
case are treated in exactly the same way as finite-dimensional quantum observables. 
Unfortunately, many observables of interest in quantum mechanics are not described 
by bounded operators, most notably position and momentum. Although rigorous 
methods of dealing with such observables exist, 1 will only sketch a technique dis- 
cussed in [Bouten et al. 2007a]. Our von Neumann algebra ^ C ^(7Y) contains 
only bounded operators and we need to somehow relate an unbounded operator A to 
this algebra. To do so, define the operator Ta = {A + . Since A is self-adjoint, 
it has a real spectrum, so we know that Ta is invertible and has bounded inverse. If 
Ta G ,yV , we say A is affiliated to ,yV . This is analogous to the classical notion of 
measurability, in that A is not strictly in ^ , but its value may be determined if we 




(3.22) 
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know the yes-no outcomes of events in yy. Since A is a self-adjoint, linear operator 
it is trivially affiliated to ^{Ti); if it is also bounded, then it is affiliated to ^ if 
and only ii A E £^ . 

In order to close the loop, we want to represent A as a classical random variable 
using the spectral theorem, which was only developed for bounded functions. We 
note that the von Neuman algebra generated by A is trivially vN (A) = vN(T^), 
since the identity operator doesn't change anything. Moreover, commutes with 
its adjoint, so vN (T^) is commutative and bounded; we may therefore apply the 
spectral theorem, packaging A in Ta-, applying l and then mapping back. That is, 
the classical (unbounded) random variable corresponding to A is l{A) = l{Ta)~^ —i- 
From this, we can define the spectral measure P4 using Eq. 3.22 and proceed without 
further worry. Given that this technique exists, we will not worry too much about 
unbounded operators and their domains throughout the rest of this thesis. 

Let's now consider two examples which will clarify the above definitions and 
which will prove useful when considering quantum white noise processes. 

Example 3.3 (Example 3.9 in [Bouten et al. 2007a]). Let H = L'^{R), the vector 
space of square-normalizable functions and let ^ = ^{Ti). This is the Hilbert 
space for a continuous, one-dimensional quantum system, e.g. a particle on a line. 
We define the vector \ip) eH in the position basis as 

This pure state defines the quantum probability state P{X) = {iplXlip), so that we 
now have a complete quantum probability space. 

From standard quantum mechanics, we are familiar with two (unbounded) ob- 
servables on this space, position 



(3.24) 
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and momentum 

pip{x) = —ih—ip{x). (3.25) 
dx 

Using our quantum probability machinery, we can consider what classical random 
variables these represent under the given state. Clearly x is diagonal (affiliated to 
L°°(]R) C ,yV) and therefore the state tells us it is a Gaussian random variable 
with mean /i and variance cr^. Alternatively, we could consider the characteristic 
function of x, written x{k) = P(e*'^^). Calculating explicitly 



v27rcr^ J-oo 

which we recognize as the characteristic function of Gaussian random variable with 
mean /i and variance cr^. Similarly, we can definite the characteristic function of p 
as p{k) = P(e*^'') and recalling that p is the generator of displacements in position 

POO 

p(k) = {i;\e'''P\^) = / dx^{x)tP{x + hk) = e-^''='/8-2 (3.27) 



which is also the characteristic function of a Gaussian, but with mean zero and 
variance h'^/4a'^. Note that AxAp = h/2 as expected for the minimum uncertainty 
state lip). 

The final example in this section considers the Hilbert space of the harmonic oscil- 
lator, which given its fundamental role in quantizing the electromagnetic field, serves 
as an important step towards quantum white noise processes which are prevalent in 
quantum optics. The following theorem will play in important part in characterizing 
operators on this space. 

Theorem 3.4 (Stone's Theorem (Theorem 3.10 in [Bouten et al. 2007a])). 

Let jV he a von Neumann algebra and let {Ut}tm be a strongly continuous group of 
unitary operators. Then there is a unique self-adjoint A affiliated to ,JV called the 
Stone generator such that Ut = e**^. 
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This theorem is often imphcitly used in quantum mechanics when analyzing contin- 
uous symmetries, such as when identifying the Hamiltonian as the generator of time 
displacements. 

Example 3.4 (Adapted from Example 3.11 in [Bouten et al. 2007a]). Let ^ = 
^{Ti) with Ti = £^(N), the set of square normalizable functions on the integers. 
On this space, define the orthonormal number basis |n) with n = 0, 1, . . . where 
{n\k) = 5nk- We further define the unnormalized exponential state for a G C as 



which are the unnormalized form of the coherent states \q) = |e(a;))e~'°'^/^. As 
we know from quantum mechanics, the exponential vectors provide an overcomplete 
basis for Ti, which mathematically means their linear span V is dense in Ti. As the 
last ingredient for our quantum probability space, we define the quantum probability 
states Pa(X) = (q;|X|q;). 

Lets consider some observables on this space. The most straightforward is the 
diagonal operator n which acts on number states as h\n) = n\n) and although 
unbounded is affiliated to £°°(N) C The spectral measure of n is simply 

Pn{B)\tlj) = XB{k)\'ip) which occurs with probability 



Thus, the event "n takes on a value in B" occurs with the probability written above, 
suggesting that for coherent states, fi is a Poisson-distributed random variable with 
intensity 

That is basically it for diagonal random variables in the number basis, but given 
our familiarity with the quantum harmonic oscillator, we suspect position and mo- 
mentum observables are hiding somewhere as well. In light of Stone's theorem above 
and our prior knowledge that position and momentum are generators of displace- 
ments, we will construct them from a unitary representation of the translation group. 




(3.28) 




(3.29) 
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Generally, we have a two-dimensional translation, which we implement with the uni- 
tary Weyl (or displacment) operator 

W^\e{a)) = \e{a + 7))e-^*"-lTl'/2 (3_3q) 

where 7 G C determines the displacement in the complex plane. This is analogous 
to the standard displacement operator in quantum optics used to transform coherent 
states. One can verify the unitarity of Wa directly and further note that the Weyl 
operators form a group under multiplication since WaWp = PFa+zje*^™^*". Note that 
we have defined the action of W.y on the exponential vectors, from which their linear 
span may be used to extend the action to all of H. 

In order to apply Stone's theorem, we need to turn this into a one parameter 
unitary group. As such, fix a particular /5 G C and consider the one parameter group 
{Wtf3}tm- It is continuous since Wt/3\e{a)) |e(a;)) as 1 1— > 0, so by Thm. 3.4, there 
exists a self-adjoint 5^ such that Wtf3 = e'*-^'^. We can then ask for the distribution 
of this generator under the coherent state in terms of the characteristic function. 
Letting b^lk) = Wkp = e^^^i^ be the characteristic function of Bp, we find 

Bp{k) = V^{Wkp) = (e(a)|e(a + A;/3))e-'=^'"-'='l^l'/2-l"P = ^2ikin.(a'p)~em^/2 (g^g^^ 

which means that for coherent states. Bp is a Gaussian random variable with mean 
2Im(a*/5) and variance 

We can also find an explicit representation of Bp acting on the exponential vectors. 
Given the Stone representation of Wtp, this is simply 



ij3*a\e{a)) — 2^16(0; + t(3)) 



(3.32) 



t=o 

In order to recover the familiar harmonic oscillator operators, we need to explore 
particular (3 values. Given that x generates displacements in momentum, which is 
the imaginary axis in the complex plane, we set x = Bi. Similarly, since p generates 
displacements in position, we set p = B_i. Given these two operators, we can 
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introduce the lowering operator d = {x + ip)/2, which from the representation of Bp 
above means 



d\e{a)) — 



a\e(a)) — i — \e(a + ti)) 
at 



+ a\e{a)) + -f\e{a — t)) 

LLv 



t=OJ 



(3.33) 



a\e 



(«)) + ^ E I [-^(« + *^)" + - n=o ^ (3-34) 

n * 

Oi\e{a)) + \Y1 + ~ ~ ^)""^] lt=o (3.35) 

+ \ E - ^(«)""'] L=o % (3-36) 

n ^ 

OL\e{a)) (3.37) 



Thus, the lowering operator acts as expected on exponential (and by extension coher- 
ent) states and one can easily show that a|n-|- 1) = \/n + l|n) as expected. One can 
also verify that the raising operator, which is the adjoint a"'', acts as (^\n) = y/n + l\n) 
so that n = a^a. 

On the one hand, this example shows that all of the familiar observables and 
operators of the harmonic oscillator may be posed in the quantum probability frame- 
work. On the other hand, if one were to instead focus on a classical probability model 
for these observables, it is seems unusual that both Poisson and Gaussian random 
variables emerge from the same state Pq, and moreover, there is a continuous map 
between the two via x,p ^-^ {x — ip){x + ip)/A = h. One could never continuously 
transform two continuous random variables into a discrete random variable in clas- 
sical probability theory. The reason we can do so here is that x,p and n do not 
commute, indicating that we could never realize them in the same measurement and 
therefore need not worry about applying the spectral theorem to all simultaneously. 
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3.1.4 Quantum Conditional Expectation 

As was the case for finite dimensional quantum systems, all the heavy lifting needed 
to construct a quantum conditional expectation is handled by the spectral theorem 
(Thm. 3.3), which relates a commutative von Neumann algebra to a classical proba- 
bility space. Additionally, all the details about including an explicit probe model for 
conditioning are no different than was the case in finite dimensions. I forego recount- 
ing those details and instead focus on some subtleties of the quantum conditional 
expectation that have heretofore been overlooked. For completeness, I first restate 
the quantum conditional expectation in terms of the quantum probability model. 

Definition 3.9. Consider the quantum probability space (^,P) and let ^ C ^ 
be a commutative von Neumann subalgebra. Then the map P(-|£/) : i-^ is (a 
version of) the conditional expectation if P(P{B\^)A) = P{BA) for aX\ A E £^ and 

B e 

Firstly, what does "a version of" mean in this context? As is often the case for 
infinite dimensional systems, there is a freedom of definition for operators which have 
measure zero under the state P. Thus, the uniqueness of conditional expectation in 
the quantum probability setting means that any two version of P{B\s^), call them P 
and Q, satisfy ||P — (5||p = where ||X||p = P(X"''X). If P and Q happen to differ on 
a part of Hilbert space where the state P has no support, then they would be different 
operators, but not in any important way relative to the conditional expectation. 

Secondly, we only defined the spectral theorem for bounded, self-adjoint oper- 
ators. For such operators, the conditional expectation is explicitly calculable as 
P{B\^) = L~^{Ep{L{B)\a{L{^})). Although we have discussed how to extend such 
a definition to unbounded operators, it is not clear how to find an explicit form for the 
conditional expectation when the operators are not self-adjoint. After all, such opera- 
tors do not generally have a spectral decomposition, so the simple mapping through l 
does not exist. But we can trivially decompose an operator in terms of its self-adjoint 
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parts. That is, B G may be written B = Bi -\- iB2, where Bi = [B + B'^)/2 and 
B2 = i{B'^ — B)/2. Since -81,-82 are self-adjoint and since the conditional expectation 
is linear, we may use t on Bi and B2 such that 'P{B\£^) = P{Bi\£/) + iP{B2\^). 



3.1.5 Quantum Bayes formula 

As we saw in Chapter 2, using the explicit formula for conditional expectation is 
not always convenient when working in infinite-dimensional spaces. This is also 
true for infinite dimensional quantum spaces, as the simple formula in Eq. (3.8) is 
often unwieldy, especially in the filtering problem, where conditional expectation 
calculations are often easier under a different measure. We therefore will often use 
the following quantum Bayes formula when performing inference. 

Theorem 3.5 (Quantum Bayes Formula (Lemma 3.18 in [Bouten et al. 
2007a])). Let ^ be a commutative von Neumann algebra and let s^' be equipped 
with a normal state P. Choose the reference operator V G £/' such that V'^V > 
and PCV'^V) = 1. Then we define a new state on £/' by Q(v4) = P(yMy) so that 

Proof. Let K be an arbitrary element of Then for all A G we have 

P{P{V^ AV\£^)K) = P{V^AKV) (3.39) 

= Q{AK) (3.40) 

= Q(Q(AK)K) (3.41) 

= P{V^VCl{A\£/)K) (3.42) 

= P(P(VVQ(AK)K)K) (3.43) 

= P{P{VW\£^)Q{X\£^)K) (3.44) 

Since K was general, this must be true for the other operators under the outermost 
P, so that we read off Bayes formula by moving P{V^V\£/) to the other side of the 
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equation. In the manipulations above, we used the fact that K commutes with all 
operators involved, the definition of the conditional expectation and the "module 
property" that P(v45|<^) = 5P(A|^) if 5 G ^. □ 

The definition (and proof) are very similar to the classical Bayes formula in Theorem 
2.6, but we do have an added interpretation in the quantum setting. Although V will 
not always be unitary, the transformation to the state Q is reminiscent of moving 
into an interaction picture, which is a common tool in standard quantum mechanics 
for simplifying calculations. We see such a change in the following example that is 
similar to the reference probability approach we will use in deriving the quantum 
filter. 

Example 3.5 (Example 3.19 in [Bouten et al. 2007a]). Consider modeling a Stern- 
Gerlach (SG) experiment in which we measure the spin state of an atom using its 
spatial degree of freedom. Following our previous examples, we define the spin degree 
of freedom for a spin-1/2 particle by the von Neumann algebra = ^(C^) spanned 
by the Pauli operators and we define the position degree of freedom along the z axis 
by ^ = e^^(£^(N)) with position operator q and momentum operator p. Note that 
we are using the harmonic oscillator definitions from Example 3.4 rather than the 
£^(M) definition from Example 3.3, which are equivalent up to a change in units. 
Thus, the overall von Neumann algebra is ^ = ^/(^ We will assume that 
the initial states of the two degrees of freedom are uncorrelated so that we may 
write P = (g) Pq, with P^(X) = {iPq\X\iPq) for an arbitrary spin-state {ipo) and 
Po = (0|X|0). We choose the vacuum state as the initial position state, indicating 
the atom is initially at rest in a minimum uncertainty state. 

Our simple model of the SG device corresponds to appling a magnetic field gra- 
dient that is linearly related to the spin along z and the position along z. This will 
cause a displacement of the momentum relative to the spin state, so that measuring 
the momentum will provide an indirect measurement of a^- For simplicity, we will 
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ignore the free Hamiltonian of the system which would transform the shift in momen- 
tum into a translation in position. That is, we assume we can measure momentum 
directly, so that it acts as a probe for the internal spin state of the atom. The unitary 
which describes the action of the magnetic field gradient is 

U = exp {ina, q) = P,^+i0e'^^ + P.^-i^e'''"^ = P,^+i0Wi^ + P,^-i0W-i^ (3.45) 

where k represents the time integrated gradient in appropriate units. Since we intend 
to measure the momentum, we begin by considering the statistics of that measure- 
ment in terms of the characteristic function for [/"''(/ ® p)U, 

pi^^ikuHimu^ = P(f/t(/ ® W.k)U) (3.46) 

= p^(p,,+i)p,(iy_,,iy_feiy,,) + p^{p,,.i)p,{Wi^w.kW.i^) 

(3.47) 

= P^(P.,+i)e2^"'=-'^'/' + P^(P,,_i)e-2*'^^-'='/2 (3^48) 

where we have used the group property of the Weyl operator and calculations from 
Example 3.4. The characteristic function tells us that the atom's momentum dis- 
tribution after the interaction is a sum of two Gaussians, each with unit variance 
but with means ±2/t weighted by the probability of having spin up or down given 
by Pf^{Pz^±i). Note that this distribution does not perfectly resolve the spin states. 
If our policy was to assign the spin state according to the sign of the observed mo- 
mentum, there is some probability to assign the wrong spin state since the tails of 
the Gaussians overlap as is seen in Fig. 3.2. This probability becomes smaller as the 
field gradient k, becomes larger. 

Of course, the purpose of using the position degree of freedom as a probe for 
the spin degree is so that we may talk about the conditional expectation of spin 
observables, such as ax, given the indirect measurement. The two clearly commute 
since O p)U, W{ax ® I)U] = 0. We therefore set ^/ = vN ®p)U), for 

which W{ax ® I)U G so that P{W{ax ® I)U\£/) is well defined. Following 
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Figure 3.2: Probability distribution for momentum measurement W{I ® p)U 
with K = 1 in arbitrary units and initial spin state \+x). 

the development of the quantum Bayes formula, we note that for a unitary U and 
state Q(X) = P(f/"''Xf/), the definition of the conditional expectation shows that 
P([/tX[/|[/t^f/) = f/tQ(X|^)f/. For our problem, this means 



which is analogous to performing the conditional expectation calculation in the 
Schrodinger picture, then using U to transform back to the Heisenberg picture. Note 
that £^ = U'^ vN (/ ® p) U . Given that U entangles the two subsystems of the atom, 
we expect the conditional expectation calculation would be easier under the state 
Q and a simple application of quantum Bayes rule would allow us to evaluate the 
desired conditional expectation. Unfortunately, U does not commute with I ® p 
{U ^ ^')) so Bayes rule will not work in this form. 

However, given that we are working in the vacuum state, we can perform some 
tricks to construct a related operator V which does allow us to apply the Bayes 
rule. Specifically, the part of U that gives us trouble is e*'^'^, which clearly does not 
commute with p. But given that a|0) = 0, we find 



P(f/t(a, ® I)U\£/) = f/tQ(a, ® /| vN (/ ® p))U, 



(3.49) 
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= e-'^'/^e^'^^^e-^^'^lO) = e-'''e''P\0) (3.50) 

so that 

Poie'^^^Xe'"^) = e-^'^'Po(e'^^Xe'^^). (3.51) 

Thus in the vacuum, we can replace expressions involving q with expressions involving 
p without changing the results of any calculations. We can therefore replace U by 

V = e-^'e^^^^P = e-^' (P,,+i (8) e^^ + P^,_i (g) e'^^) (3.52) 

so that Q(X) = P{WXU) = PiV'^XV). Although V is not unitary, it does commute 
with / (g) p so that we can apply the quantum Bayes formula to find 

P{U {a,®I)U\s^)- C/tP(ytT/|vN(/®p))C/ " ^'^"^"^^ 
Now note that 

yt(a, ® I)V = P.,+ia,P,,+i e'^'P + P,,-i^.P,,-i ® e-^*^^ 

+ P,,+ia,.P,,_i ® / + P,,_ia,,P,,+i ® / (3.54) 

and 

V^V = P^,+i e^'^^ + P^,_i (8) e-^'^^ (3.55) 

Since is independent^ of any spin operator under P, we use the module property 
to pull it through the conditional expectation and find 

P(yt(^^ (g, i)v\ vN (7 (8) p)) = P^(P.,+ia,P,,+i)e2«^ 

+ P^(P,,_ia,P,,_i)e-2"^ + 2 Re P^(P,,_ia,P,,+i) (3.56) 

and 

P{V^V\ vN (7 p)) = P^(P,,+i)e2'^^ + P(P,,_i)e-^'^^. (3.57) 



■^Recall that P{B\£/) = P{B)I if B is independent of 
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Wrapping these in W and U gives the overall conditional expectation as 

P(t/"^K®/)f/K) 

+2 Re P^(P.,-ia.P.,+i)) / (P,(P.,+i)e2«^^(^««^ + P{P,^^,)e-"^^''^'^^^^) 

Although the result looks a bit unwieldy, it is actually rather straightforward. Once 
we perform the SG measurement of cr^, we will have determined the value of W{I (S> 
p)U, which is then plugged into the above expression to immediately evaluate the 
conditional expectation of ® I)U. As we saw in Fig. 3.2, this is not quite 

the same result the projection postulate would give for a projective measurement of 
az, reflecting the physical nature that the SG device does not perfectly discriminate 
the az outcome for any finite k. However, as /t — > oo, we do recover the projection 
postulate. 

So what was the point of this example? After all, the Born rule provides a perhaps 
more transparent way to calculate the same probabilities. While this is true, the 
fact is that we rarely have a truly projective measurement available; the quantum 
probability formalism allows us to handle these generalized measurement with ease. 
As a result, conditioning is again a consequence of conditional expectation — no extra 
postulates are needed. Most importantly, these techniques are highly reminiscent of 
those used in developing the classical filtering equation and will therefore be essential 
when we develop the quantum filter. 

3.1.6 Summary 

The purpose of this section was to lay the groundwork for performing inference in 
the quantum setting. By developing a quantum probabihty formalism, we found 
that a commutative set of observables is identical to a classical probability theory, 
indicating we can easily leverage all the techniques we developed in Chapter 2. As 



Chapter 3. Quantum Probability and Filtering 



89 



was the case classically, care must be taken for infinite dimensional spaces, but the 
resulting tools are not substantively different. In developing the quantum conditional 
expectation, we found that inference is only possible between commutative observ- 
ables. This requires us to include a model for the probe quantum system within the 
quantum probability space, after which the familiar Born postulate for conditioning 
quantum systems simply pops out of conditional expectation calculations. Finally, 
we developed a quantum Bayes rule for relating conditional expectation calculations 
under different states. 



3.2 Quantum Stochastic Processes 

Heartened by our success in developing a quantum probability theory, we now con- 
sider developing a quantum analog of the classical stochastic processes discussed in 
Chapter 2. Given the broad applicability of classical white noise processes in describ- 
ing classical stochastic systems, we hope that an analogous quantum white noise and 
stochastic differential equation formalism will allow us to cast the the quantum op- 
tics filtering problem in similar language, after which we may extend the classical 
filtering solution to the quantum case. 

There are then two separate issues to address in this section. First, we need to 
develop a mathematical description of quantum white noise processes in terms of a 
quantum probability model and similarly devise a quantum stochastic calculus for 
manipulating such processes. The second task is to connect this mathematical model 
to a physical one, in which quantum white noise arises naturally from a suitable limit 
of a standard physical model. Although both issues admit rigorous solutions, I will 
primarily focus on the details salient for solving the quantum optics filtering problem 
as was depicted in Fig. 3.1. There are several approaches for developing the funda- 
mental quantum noise processes, including starting with the classical processes and 
then extending to a quantum probability model or starting from a quantum model 
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and developing the quantum processes directly. We will follow the latter approach 
as done in Barchielli [2003]; Bouten [2004]. This offers a more straightforward route 
to quantum Brownian motion than the development in Bouten et al. [2007a], which 
focuses on developing both Poisson and Gaussian quantum noise processes by gener- 
ating a quantum probabihty space from a classical probability space. The interested 
reader should consult [Parthasarathy 2002] and [Accardi et al. 2002] for a thorough 
and rigorous presentation of the topics discussed within this section. 

3.2.1 Symmetric Fock Space 

Given our ultimate goal of describing experiments in quantum optics, the Hilbert 
space for our quantum probability space is naturally that for the quantum electro- 
magnetic field. In this section, we focus on the development of this space in terms 
of a single polarized mode of the field which may then be extended to describe the 
full quantized field over many spatial modes. The Hilbert space for a single photon 
in this single mode is 



7i is simply the space of integrable functions in time that return elements in C^. 
Thus an element in 7i is a function ft G £^(]R; C^), which for every time t, tells us 
the polarization state of a single photon. If we fix an orthonormal polarization basis 
61,62 in C^, then we can decompose these functions as ft^^Ci + /t^^62, so that the 
inner product is 



Embedding time directly into the Hilbert space is perhaps unusual, since time usually 
appears as a parameter via unitary evolution. Later when tying this formalism to a 
physical model, we will see that the explicit inclusion of time in Ti. is analogous to an 
interaction picture representation, where the states have a explicit time dependence 




(3.59) 




(3.60) 
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due to a free field evolution. For now at least, we take this approach so that the 
resulting quantum stochastic processes, which are merely families of operators on Ti. 
indexed by time, are defined in analogy to classical stochastic processes. 

States of the electromagnetic field mode involve potentially many photons, which 
are best considered in the second quantized picture, where the Hilbert space is the 
symmetric Fock space 

OD 

^ = C©07^®=". (3.61) 

n=l 

Each tensor sum term corresponds to a sector with a fixed number of photons, e.g. 
the zero photon sector, the one photon sector, . . . ; within a given sector, we use 
the symmetrized tensor product ®s, which ensures that only symmetric states of 
the constituent photons are possible (which must be true for bosons). Following our 
approach in Example 3.4, we define the exponential vectors for / G as 

oo ^ 

|e(/)) = l©0^r", (3.62) 

n=l * 

which when normalized are the coherent vectors \ip{f)) = exp(— ^||/|p)|e(/)). Note 
that {e{f)\e{g)) = exp {f,g)- These states are dense in JF so that we may define the 
action of operators on them and extended to all other states. The coherent vectors are 
analogous to coherent states of the harmonic oscillator, which we recall had number 
state amplitudes related to powers of the complex number a. For the coherent vectors 
liplf)), this generalizes to having the same single particle state / for each photon in 
the different sectors, where this state is specified over all time t. An important state 
for our purposes is the vacuum vector |$) = li'iO)) = |e(0)) = 1 © © . . ., which 
defines the vacuum state P^iA) = {^\A\^) for A e ^{T). 

The quantum probability space is then defined by the von Neumann algebra 
jV = ^{J-"), the set of bounded operators on the symmetric Fock space, with vacuum 
state P^. Before studying specific operators in this space, we note that it admits a 
natural decomposition analogous to those for classical filtrations called a continuous 
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tensor product structure 

T = Ts]® ^[s,t] ® ^[t (3.63) 

for < s < t. This continuous decomposition also holds for the von Neumann 
algebra 

^ = ^s]® AsA ®At = ^i^s]) ® ^{^[s,t]) ® ^{^[t) (3.64) 
and the exponential vectors 

|e(/)) = |e(/.])) ® |e(/[,,i])) ® |e(/[i)). (3.65) 

Thus the operator process {Xt} affiliated to ^ is adapted if Xt is affiliated to ,yVt\ 
for every t, which is equivalent to it having the form Xt® I on Mt] ® A/jj. 



3.2.2 Quantum White Noise 

Given the close analogy of the symmetric Fock space to the harmonic oscillator space 
considered in Example 3.4, we expect to find a Gaussian operator process by studying 
Weyl transformations of the exponential vectors. Taking this analogy seriously, pick 
a. g eH and define the Weyl operator W{g) G =^(JF) as 

iy(^7)|e(/)) = e-ill^ll'-<^'^)|e(/ + ^7)). (3.66) 

Recall that the harmonic oscillator Weyl operator performed a translation in C by 
some complex number 7; the Weyl operator here extends this to a translation in 
£^(]R(8)C^) by the single photon state g. Note that the Weyl operators form a group 
via the relation 

Wif)Wig) = e-^'^^f^^^Wif + g) f^geU. (3.67) 

From the continuous tensor product structure, we see that W{gx[o,t]) is an adapted 
operator process. 
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In order to apply Stone's theorem to find the generators of these translations, we 
pick a particular g E Ti and study the one parameter group {W{tg)}t(=-^+. Stone's 
theorem then tells us that there exists a self-adjoint B{g) G ,yV such that 

W{tg) = e**^(f) (3.68) 

The operators B{g) are known as field operators, which we will later tie to the more 
familiar electromagnetic field operators. For now, we continue in the tradition of 
Example 3.4 and consider the statistics of these operators under the vacuum state. 
Their characteristic function is 

b,{k) = F^{W{kg)) = (e(0)|e(A;^?))e-^ll^ll' = e-^"^"' (3.69) 

which indicates that B{g) is a mean zero Gaussian random variable with variance 
Note that if we were to use an arbitrary coherent state, rather than the vacuum 
state, the field operators would still be Gaussian with the same variance, but with 
non-zero mean. 

Now in order to identify this with a classical stochastic process via the spectral 
theorem, we need to consider a commutative operator process. Specifically, consider 
the operator process {Bf''^ = i?(e*'^egX[o,t])} for a fixed and polarization q. By 
construction, this is an adapted process and from the continuous tensor product 
structure, we further know that B{e^'f'egX[s,t]) is affiliated to ^s,t]- Therefore incre- 
ments for independent intervals commute and since B^"'^ = J, we know that any pair 
and Bf''^ commute. Thus vN (^{Bf''^}t(zR^ is a commutative von Neumann alge- 
bra and from the spectral theorem, is equivalent to a classical stochastic process. But 
we also know from Eq. (3.69) that the increment i?f'^ — Bf''^ [t > s) is a mean zero 
Gaussian random variable with variance t — s when in the vacuum state. The con- 
tinuous tensor product structure further implies that time independent increments 
are statistically independent, so that Definition 2.18 tells us L{Bf''^) is identically a 
classical Wiener process! Indeed, by varying 0, we see that an entire family of Wiener 
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processes may be constructed. However, they do not generally commute with each 
other, so that only one may be identified in a given realization. 

There is a particular set of these quantum Wiener processes which we now identify. 
Let Q1 = B{iegX[o,t]), P? = B{-egX[o,t]) and = {Ql + iP^)/2. These are analogous 
to the position, momentum and annihilation operators introduced in Example 3.4 
and correspond to different quadratures of each polarization mode of the quantum 
electromagnetic field. These allow us to introduce the fundamental noises 

At\e{f)) = l^iy.m^ |e(/)) (3.70) 

{e\{9)Af\e{f)) = ( f 9:{s)ds) {e{gMf)) (3.71) 







{e{g)\Ar\e{f)) = (^j\;{s)fr{s)ds^ {e{9)\e{f)) (3.72) 

The creation and annihilation Af'' processes are precisely quantum Wiener pro- 
cesses we just studied and are formally related to the familiar interaction picture 
Bose field operators for the single mode via 

Aj = [ ag{s)ds Af = [ aids (3.73) 
Jo Jo 

where [aq{s), aris^] = 6qr6{t—s) and all other commutators are zero. Given the delta- 
time correlation, we see that these canonical field operators are singular objects, 
in analogy to the usual delta distribution definition of classical white noise. The 
remaining gauge or scattering process A^^ may also be represented in terms of the 
usual Bose fields as 

Af = f al{s)ar{s)ds. (3.74) 
Jo 

As detailed in [Bouten et al. 2007a], the diagonal entries Af correspond to counting 
quanta in a given polarization mode and can be related to classical Poisson processes 
when the field is in a coherent state, recovering the other stochastic process expected 
in generalizing Example 3.4. 
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3.2.3 Quantum Stochastic Calculus 



Since we are ultimately interested in describing quantum stochastic processes driven 
by the fundamental noises, e.g. systems with a formal^ Hamiltonian H{t) = Hq + 
HiQt + H2Pt + H^At, our next task is to develop an appropriate quantum stochastic 
integral and calculus, keeping in mind the mathematical issues we had to handle in 
the classical case. Note that in order to more clearly focus on the essentials, I have 
restricted consideration to a single polarization mode by dropping the polarization 
index on the fundamental noises; it should be clear how to generalize the following 
to account for multiple polarization modes. Suppose we were only interested in inte- 
grals with respect to a single quadrature, say Qt, where the integrands are adapted 
quantum stochastic processes that commute with Qt. Given this commutative set, 
we inherit the Ito integral and calculus definitions through the spectral theorem, 
so that all the mathematical subtleties are handled by our classical construction in 
Chapter 2. Of course, we are really interested in processes which are driven by all 
three fundamental noises, which do not commute with each other and therefore do 
not admit a simultaneous classical probability mapping. Following Bouten et al. 
[2007a], I will attempt to sketch the development of quantum stochastic calculus 
following Hudson and Parthasarathy [1984]; Parthasarathy [2002], noting many of 
the technical issues involved, but neglecting to delve into the details. 

Recalling the physical picture we have in mind (Fig. 3.1), we see that there are 
really two physical systems to consider — the optical field and the atoms. Letting 
{■y^f, P(/,) be the quantum probability space for the electromagnetic field in the vac- 
uum state that was developed in the last section, we similarly need to define the 
quantum probability space (^, Ps) for the atomic system. Generalizing slightly, we 
set ^ = ^{Hs) and P,(A) = Tr[Ap], where Hs is some finite dimensional Hilbert 
space and p is a corresponding density matrix on that space. Thus the overall space, 

^Meaning a careful interpretation of what the time-derivative of a quantum Wiener 
process means. 
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® P(f> Ps) allows us to study how the quantum noises couple to the atomic 
system and how the two jointly evolve. This will be made more precise later in this 
section and for the time being, we focus on the mathematical problem of defining 
integrals of the form LgdMs where Mt is one of the fundamental noises and Lf is 
an adapted process, which here means it is affiliated to ^ ■ Often, Lt will be 
trivially adapted and correspond to an time-independent operator which acts as the 
identity on the entire space. 

The approach we take in defining quantum stochastic integrals follows the one 
taken classically; we begin by defining the integral for simple processes and then look 
to define arbitrary integrals as a suitable limit of simple approximations. First, recall 
that for s < t, any of the fundamental noise increments Mt — Ms are affiliated to 
^fygty Given that Lt is affiliated to ^ft\ assumption, this means we may write 
Lg ® {Mt — Ms) — Ls{Mt — Ms) — {Mt — Ms)Ls, i.e. the processes commute and 
there are no issues in multiplying these unbounded operators. This is analogous to 
the non-anticipative property of the classical Wiener increment. Simple processes Lt 
are those whose values change at the fixed sequence of times defined by the partition 
Tin of [0,t], e.g. 

= XI ^uX[UM+^) (3.75) 
SO that we may define the quantum stochastic integral for these simple processes as 
I LsdMs = J2 Lu{Mu+, - Mt^). (3.76) 

As was the case classically, the difficulty now is to extend this definition to arbitrary 

Lf in terms of an approximating sequence of simple processes L" whose stochastic 
integrals converge to give a unique integral for the initial process Lt. 

More concretely, consider the set of adapted processed {E, F, G, H) which admit 
the simple approximations {El\ F", G", if"). We want to define the integral 

It= ! EsdAs + FsdAs + GsdAl + Hsds (3.77) 
Jo 
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as a suitable limit of the corresponding simple approximations /" over the simple 
processes. Recall that classically, we were able to use the Ito isometry (Lem. 2.3) 
to define this limit uniquely in C^. Taking the same approach here is not quite so 
straightforward. For example, let Tig = C so that it may be ignored for the time 
being; then the seminorm is given by = ($|X|$). Mean square convergence of 

It then corresponds to \\It - I'tWl = - ItVi^t - It)\'^) ^ as n ^ oo. 

Such convergence is clearly sensitive to the particular state |$). Does this mean the 
domain of It is only the vacuum? That is, how does It act on vectors orthogonal to 
the vacuum if it is only defined relative to convergence in the vacuum state? 

There are many inequivalent ways out of this ambiguity and we follow the ap- 
proach of Hudson and Parthasarthy [Hudson and Parthasarathy 1984]. We fix the 
domain of It to be Tis ® T> from the start, where V is the linear span of expo- 
nential vectors |e(/)). It is then the unique operator on this domain, such that 
(f I (S> {"iplilt — IfVilt — -^r)l^) ® 1^) every |t>) G Hs, € V. This corresponds 
to a simultaneous mean square limit for all states in our fixed domain. Hudson and 
Parthasarthy show that this limit exists as long as || {Eg — i?") |f ) ® ^ds as 
n ^ oo V|f) G Tis, G T> and likewise for F,G,H. Additionally, they show that 
every square-integrable process, e.g. /g||-Es|'i^) ® l"^)!!^ < oo Vl^;) G TisAi') ^ 
admits a simple process approximation. Thus on the fixed domain, we have essen- 
tially the same stochastic integral construction we did classically, in which square- 
integrable processes admit a unique simple approximation, the integrals of which 
admit a unique limit as long as these approximations converge independently of the 
choice of approximation for each of E, F, G, H on all states in the domain. 

Definition 3.10. The quantum ltd integral for the adapted and square-integrable 
processes {E, F,G, H), written 

It= [ EJAs + FsdAg + GsdAl + Hsds, (3.78) 
Jo 

is uniquely defined on ® I' as a limit of simple approximations. 
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Note that for the vacuum reference state, we have the nice property that Af|$) = 
At\4>) = 0, so that in the vacuum the Et,Ft terms go to zero. Similarly, since AI 
acts to the left as At does to the right, we further know that the Gt term is zero 
in vacuum expectation, although 7^ 0. Therefore just like the classical Ito 

integral, the quantum Ito integral is entirely "deterministic", i.e. not driven by the 
quantum noises, in vacuum expectation. 

Rather than working with the quantum Ito integral, we often write a correspond- 
ing quantum stochastic differential equation (QSDE) 

dit = EtdAt + FtdAt + gtdA\ + Htdt (3.79) 

which is really just a shorthand representation for the integral in Definition 3.10. The 
differential notation is retained to remind us of the singular nature of the stochastic 
processes, which don't have a well-defined standard time derivative. As the quantum 
generalization of the classical stochastic differential equation, we can also study the 
transformation rules of QSDEs. Again, defining such properties are done relative 
to a particular domain, as it is not clear a priori that e.g. the product of integrals 
ItJt is a well-defined operator on the domain Tig ® "P. The insight of Hudson and 
Parthsarathy is to use the fact that the adjoint^ // is well-defined when restricted to 
our fixed basis, so that the expression for ItJt is read off from examining the matrix 
elements {{v'\ ® {il)'\l\){Jt\v) ® for arbitrary states in the domain. This gives 
rise to the quantum generalization of the Ito rule (Thm. 2.3). 

Theorem 3.6 (Quantum Ito rule, Theorem 4.2 in [Bouten et al. 2007a]). 

Let {Fi'',G'^,Hi,I), {Bi'',C'^,D'i,E) and {Bi'-^C^ D'^^ E^) he mtegrahle stochastic 

processes where the latter are adjoint pairs. Consider the stochastic integrals with 

^Taking the physicists perspective, I have been very casual in using the adjoint f in 
place of the Hilbert space adjoint independent of the domain of the operators. It is not 
generally true that domain of the adjoint of an operator coincides with the domain of the 
operator itself. Therefore the Hudson-Parthasarathy approach involves more care than I 
let on, but the details are not so relevant for this introduction. The reader should consult 
the references [Bouten et al. 2007a; Parthasarathy 2002] for more rigor. 
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QSDEs 

dXt = B^dKY + C^tdA'i + D^tdAf + Etdt (3.80) 
dYt = rfAf + CfdAl + H^dAf + hdt (3.81) 

(3.82) 

where repeated polarization indices are summed over. The stochastic process Zt = 
XtYt satisfies the QSDE 

dZt = XtdYt + dXtYt + dXtdYt (3.83) 

where the differential products are evaluated using the fundamental ltd table 



dMi\dM2 


dAf 


dAi^ 


dAl 


dt 


dAl' 














dkf 


SudAl^ 


SudA',' 








dAl 


6kidt 


hidAl 








dt 















Theorem 3.6 provides us with a simple set of algebraic rules for manipulating prod- 
ucts of quantum stochastic differential equations and makes the transformation of 
complicated stochastic processes almost trivial. It is worth noting that the Hudson- 
Parthasarathy construction was a particular choice which leads to a useful quantum 
stochastic calculus that describes many interesting physical setups (as we will soon 
see). Nonetheless, there are open mathematical questions of how to generalize the 
approach or what alternate constructions may be useful. 



3.2.4 Quantum Stochastic Limit 



The main lingering question is whether the quantum white noise processes we have 
developed are actually useful in describing physical systems of interest. After all, 
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there is no utility gained in developing a quantum stochastic calculus if we can't use 
it in practice! Obviously, we have been working with a particular physical model in 
mind, in which the quantum white noise processes are operators on the electromag- 
netic field. We thus need to formally tie the physical quantum model one usually 
would write down for such a setup to the abstract mathematical model considered 
above. Although there are several approaches one might consider, we follow that of 
Accardi et al. [2002] who layout a very general method for deriving quantum white 
noise approximations of a broad range of system-reservoir interactions. That is, one 
considers a Hamiltonian of the form 



where i^o = Hs + Hr is the free system and reservoir Hamiltonian and Hi is the 
interaction Hamiltonian modulated by the coupling parameter A. We are interested 
in regimes where the coupling is weak, A 0, but when its affect builds up over 



of scattering theory and the weak-effect regime of perturbation theory. Although 
Accardi et. al refer to this regime as the quantum singular limit, it is also known as 
the quantum Markov limit, the van Hove limit, the quantum stochastic limit and the 
quantum central limit [Accardi et al. 1990; Gardiner and CoUett 1985; Gough 1999; 
2005; Van Hove 1955]. The quantum central limit name is particularly appealing 
since we are interested in the cumulative affect of many infinitesimal interactions, 
much as the classical central limit considers the accumulation of many infinitesimal 
random kicks, e.g. our construction of the Wiener process as the limit of a random 
walk in Eq. (2.39). 

In a physical sense, the quantum stochastic limit is related to a separation of 
timescales of an interacting system. One timescale is the relaxation time tji which 
is the characteristic decay time of the correlation {Hj{0)Hj(t)) , where Hj(t) = 
^itHoff^Q-itHo jg ^YiQ interaction Hamiltonian with respect to the free evolution. The 
slow degrees of freedom have characteristic decay time ts with respect to the corre- 



H = Ho + XHi 



(3.84) 



long times, t 



oo; essentially considering simultaneously the long-term regime 
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lation (X(O)X(t)) - (X(0)) (X(t)) where X{t) = e'^^^Xe-'^^o ^nd X is an arbitrary 



observable. There is also the interaction time tint which again describes the decay 
of correlations of observables X{t), but now where X{t) is evolved under the total 
Hamiltonian H. A typical scenario has tji -C tint ^ ts, so that the fast degrees of 
freedom, when considered relative to the slow degrees of freedom, look completely 
uncorrelated and are therefore well described as a white noise. This should surely be 
the case for the quantum optics systems, where the vacuum fluctuations occur on a 
much faster timescale than atomic interactions. 

The quantum stochastic limit then attempts to find the form of both Ut and 
Hj{t) in the following sense 



This is notably different than the standard quantum Markov approximation taken 
in deriving a quantum master equation, as for e.g. done in Walls and Milburn [2008, 
Chapter 6]. Rather than finding effective dynamics for a reduced system, i.e. just 
the atoms, the quantum stochastic limit finds effective dynamics for the full system, 
i.e. both the atoms and field. This is particularly useful for the quantum filter, in 
which we want to measure the electromagnetic field in order to perform inference on 
the atomic system; if it were eliminated in the weak limit, we would have nothing 
left to measure! 

Clearly, there must be some relationship between t and A in this limit, since taking 
A — i> independently would completely decouple the interaction so that Hi{t) — > 
in the free Hamiltonian interaction picture. The following lemma shows that the 
only sensible scaling is to set 1 1-^ t/A^ and study just the A ^ limit. 

Lemma 3.1 (Lemma 1.8.1 in [Accardi et al. 2002]). Let (■) represent expectation 
with respect to some fixed state and suppose that Hi{t) as described above is mean 
zero, time-invariant and integrable: 




i\Hi{t)Ul 



dUt 



dt 



iHi{t)Ut- 



(3.85) 



{Hiit)) = 



(3.86) 
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{Hj{ti + s)--- Hj{tn + s)) = {Hj{ti) ■ ■ ■ Hj{tn)) (3.87) 

/oo 

|(i/7(0)i/,(t))| <oo (3.88) 
-oo 

Then the expectation value of the second-order term in a perturbative series 



-y / dti / dt2 {Hj{ti)Hj{t2)) , (3.89) 
Jo Jo 

has a finite nonzero hmit as A ^ 0, t ^ oo if and only if 

lim \H = T (3.90) 

X-^0,t~*oo 

for some finite, non-zero constant r. In this case, the limit is 

-r f ds {Hi{0)Hj{s)) (3.91) 

J — oo 

Proof. By the time-translation invariance property, we may rewrite the second-order 
integral as 



-X- 



[ dti r dt2 {Hj{0)Hi{t2 - ti)) . (3.92) 
Jo Jo 



Setting S2 = t2 — ti this can further be rewritten as 

-A2 / dt, / ds2 {Hj{0)Hj{s2)) . (3.93) 

Jo J-ti 

Now setting si = A^ti, we have 

- r ' ds, f ds2 {HjiO)Hj{s2)) (3.94) 

Jo 7-si/A2 

Clearly si > 0, so that as A — > the inner integral tends to 

ds {Hi{0)Hi{s)) (3.95) 

which is independent of Si since Si/A^ oo independent of the value of Si. This 
decouples the two integrals and leaves only the outer one, which tends to zero as A ^ 
unless the upper limit X^t — ^> r as given in the theorem, recovering Eq. (3.91). □ 
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We see that a non-trivial limit implies that for times of order t/A^, the interaction 
produces effects of order r and thus A serves as a natural timescale for the problem. 
It is of note that this limit can be performed for all terms in the Dyson perturbation 
series, which may then be re-summed to give the effective stochastic propagator on 
the right hand side of Eq. (3.85). 

As a prototypical example, we now study the stochastic limit for a single two-level 
atom coupled to the quantized electromagnetic field. This will allow us to focus on 
the relevant details of the stochastic limit rather than issues involving representation 
of the reduced dipole operator for complicated atoms. For a more general derivation, 
the reader is enthusiastically encouraged to consult Chapters 2-5 of Accardi et al. 
[2002] . The general procedure is to first identify the free and interaction Hamiltonians 
in order to determine the interaction picture propagator in Eq. (3.85). We then make 
the replacement t t/\^ and study the time- correlations of the suitably rescaled 
interaction picture field mode operators af(A),a|(A). The hope is that in the A ^ 
limit, the correlation (?it{\)a\{\)) 5{t — s), allowing us to reconstruct the quantum 
Wiener processes in a fashion analogous to Eq. (3.73). I refer the reader to [Walls 
and Milburn 2008] for more detail on developing the quantized electromagnetic field 
and the dipole Hamiltonian given below. 

We begin by introducing the free atom Hamiltonian 

HA = ^{\e){e\-\g){g\) = ^a. (3.96) 

where I have used the usual isomorphism between an arbitrary two-level system and 
the Pauli operators. The free electromagnetic-field Hamiltonian is 

Hf = j d^kJ2i^i^(^lg(^Kg + I) (3-97) 

where cuk > 0, g is the polarization index, and the mode operators satisfy [a^^q, alig/] = 
5^(k — k.')6qqr. The dipole interaction Hamiltonian is given by 

Hi = -d- E(r) (3.98) 
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with the quantum electromagnetic field written as 

E(r)=^E+(r)-iE-(r)=iJ] J ci3k^(k)y^ak,,ek„e*'^-"- + h.c. (3.99) 

Here, ek,q are polarization vectors and g{\i) > is a mode function to account for 
the spatial variation of the optical field. Although we leave it general, we assume it 
is integrable and infinitely differentiable. 

In the dipole approximation, we take r = and write the dipole operator in 
terms of the atomic energy eigenstates: d = {g\d\e){(T_ + (T+) = dge(c_ + cr+) with 
(7_ = |5')(e|, cr+ = |e)(5'|. We then rewrite the interaction Hamiltonian as 

Hi^-i / d^k hgk,gak,g{a- + a+) + h.c. (3.100) 

q 

with the newly defined coupling strength 



Note that I have not taken the usual rotating-wave approximation, which drops non- 
energy conserving terms such as a^^ga^. These will end up dropping out as part 
of the stochastic limit and given our interest in a white noise process with infinite 
spectral bandwidth, it would be inconsistent to neglect these terms at the outset 
(even though we would get the same result). 

We next use the fact that 

^q(H^+HF)^^^^^-qiH^+Hp) ^ ak.qe-^'^"* (3.103) 

to rewrite the interaction Hamiltonian in the interaction picture with respect to the 
free Hamiltonian Hq — Ha + Hp: 

Hi{t)^-i J d^k^;i5k,,(ak,,cT_e-^("''+'^-)* + ak,,(7+e-*('^''-'*^-)*) + /i.c. (3.104) 
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Plugging into Eq. (3.85) and rescaling t i— > t/X , we have 

f-X^4w" (3.105) 

= [-hi~ax,-^Jt)a. + ~ax,+^Jt)a+) + h.c] f/f ^ (3.106) 

where I have introduced the rescaled time-domain field operators 

~axAi) = j ^^'k^(?k„^ak,ge-^(-''--)*/"'. (3.107) 

In order to characterize the behavior of these operators in the limit A ^ 0, we study 
their correlation with respect to the vacuum field, 

lim($|fi,,^(t)4j.)|<|.) = lim I rf3k5^|^?k,,r^e-(— (3.108) 

= 5{t-s) I £k.^\g]^^q\^2'K5{uj]^-uj) (3.109) 
= K{uj)5{t-s) (3.110) 

where we used the identity limA^o e"*'^*^'**^/-^^ = '2,T:5{t)5{ujY^ , which in turn allows 
the introduction of k{uj) = 27r ^^l^f^-ij-^^ ^p, the mode function evaluated at cuk = 



^'^FoUowing Proposition 1.2.1 in Accardi et al. [2002], we can easily demonstrate this with 
respect to two Schwartz functions, which are infinitely differentiable and whose derivatives 
decrease to zero faster than any polynomial. Using the test functions ip{t), </>(u;), we have 

1 = ^ J dtiPit) J (icu0(L^)e-*'^*/^' (3.111) 

Setting t = }?T this becomes 

dr'0(AV) j duj(p{uj)e-''^^ = V2n j dTi){X^T)^{T) (3.112) 

where (h is the Fourier transform of 6. We then have 



lim^ V27r j dT'ip{X''T)(f){T) = V27rV'(0) J dT(l){T) = 27r^(0)(/>(0) (3.113) 

which shows that this is equivalent to limA-*o e^*'^*/''*^/A^ = 2TT5{t)6{uj) in the sense of 
distributions. 
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u. This suggests that the hmit of flA.+wegl^) is a deha-correlated quantum Wiener 
process with strength a/ K,{uJeg). However, looking at the limit for ax^-i^^^(t) requires 
evaluating the coupling strength at cj^ = —^^eg, which is impossible since > by 
definition. In fact, one can show that these non-energy conserving terms go to zero 
in the weak coupling limit, thus recovering the rotating wave approximation. 

Although we have demonstrated that the correlations of the rescaled field oper- 
ators limit to those for a quantum Gaussian white noise (cf. Eq. (3.73)), there is 
considerably more effort required to find the limit of the interaction picture propa- 
gator U^, which involves studying the convergence of each term in the Dyson series 
expansion. Several chapters of Accardi et al. [2002] are devoted to this task, indi- 
cating it is certainly beyond the scope of this overview. Instead,we simply quote the 
perhaps unsurprising result given by the following quantum Stratonovich propagator 



dUt = dAl o a^dUt — dAt o a^Ut 



(3.114) 



where I have set n = n{uOeg). The Stratonovich increments are defined as they 
were classically in Definition 2.21, except we now have to worry about operators 
not commuting. Unlike the quantum Ito formulation, the quantum Stratonovich 
noise increments do not commute with adapted operators, e.g. O o dAt 7^ dAt ° O, 
although they do transform via the normal chain rule. The fact that the stochastic 
limit of a standard quantum differential equation is a quantum Stratonovich equation 
is precisely the quantum generalization of the classical Wong-Zakai Theorem (2.4). 
As was the case then, we can still convert to the following quantum ltd propagator 
or stochastic propagator 



dUt 



^fna^dA^t ~ y/i^o'+dAt — -na^a^dt 



Ut. 



(3.115) 



The upside of the Ito form is that the quantum noise increments commute with 
adapted processes and are zero in vacuum expectation, although one now needs to 
use the quantum Ito chain rule rather than the normal calculus chain rule. 
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3.2.5 Summary 

We have briefly developed the quantum generalizations of stochastic noise processes 
and stochastic differential equations discussed in Chapter 2. Certainly, that is the 
take away message from this section — that in spirit, the quantum versions are really 
no different than their classical counterparts. As such, I have not stressed many of the 
statistical features and intuitions we focused on classically because they are more or 
less the same. The main differences come down to the non-commutativity of quantum 
mechanics, but as we saw when developing the quantum Ito integral, the solutions 
amount to a careful extension of the classical approach. The more novel discussion 
was devoted to developing a physical model of quantum white noise, since unlike 
was the case classically, we now have a very particular class of physical systems in 
mind. We found that for such quantum optics systems, the quantum electromagnetic 
field serves as an excellent model of quantum white noise. Moreover, the quantum 
stochastic description arises naturally from a weak coupling limit of standard physical 
models. In the following section, we will use these techniques to carefully define and 
solve the quantum optics filtering problem. 



3.3 Quantum Filtering Theory 



By this point, it should come as no surprise that a broad class of models in quantum 
optics are well described by the quantum stochastic formalism. One such setup 
is shown schematically in Figure 3.3, in which an input field, described in terms 
of quantum Wiener processes dAf, dAj, interacts with a cloud of atoms. Although 
the two systems are initially unentangled, a joint interaction such as the one in 
Eq. 3.115 would correlate them. Therefore, later measurements of the scattered light 
field should contain some information about the cloud of atoms, although they will 
also contain the inherent quantum noise fluctuations of the optical field. The task of 
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Input Field Atoms Output Field Measurements 




dAt,dAl jt{X) = UIXUt U}AtUt,U}A\Ut Yt = ul{At + A\)Ut 



Filter 



7ri[X]=P[ji(X)|Y(o,t)] 



Figure 3.3: Schematic of continuous measurement in quantum optics, in which 
hght scattered by a cloud of atoms is continuously measured by a photodetector 
which is then filtered. Based on Fig 5.1 in [Bouten et al. 2007a]. 

the quantum filter is to process this continuous measurement stream in order to best 
estimate the state of the atomic system. In this section, we formalize this problem 
using our newly gained quantum probability and quantum stochastic skills and then 
derive the quantum filtering equation in analogy to the reference probability method 
we used to derive the classical filtering equation. 

3.3.1 Statement of the Filtering Problem 

Following the classical approach, we would like to pose the quantum filtering problem 
in analogy to the systems-observations pair of Eqs. (2.75) and (2.76), where now the 
system corresponds to the state of the atoms and the observations correspond to the 
measurements of the field. Both are described by the quantum probability model 
considered in Subsection 3.2.3. The time evolution and corresponding QSDEs of both 
field and atom observables are determined by the quantum stochastic propagator for 
the experiment under consideration. Based on Eq. 3.115, we will consider the generic 
stochastic propagator 



where L is some atomic operator that results in the weak coupling limit and H is 
an arbitrary atomic Hamiltonian. We also fix the quantum state as the product 
state p ® I $)($!, where p is an arbitrary atomic state and |$) is the usual vacuum 




(3.116) 
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state. Note that we could equally well consider coherent field states by noting that 
= ^/l*^)) suggesting that we could explicitly include the Weyl operator in our 
dynamical equations and work with a vacuum reference state. In particular, Ut ^ 
UtWt and applying the quantum Ito rules to d{UtWt) would give a new propagator 
which describes the displaced dynamics. Since this is not an essential part of the 
filtering problem, we will not dwell on it here. 

Instead, we now focus on the atomic evolution given in terms of the quantum 
flow or Heisenberg evolution}^ , written as it{X) = Ul{X ® I)Ut for any atomic 
observable X G Using the quantum Ito rules and the fact that an explicitly 
time-independent observable satisfies dX = 0, satisfies the following QSDE: 



djt{X) = dUjiXUt) + UjXdUt + dUjXdUt 



+ u 



L^XdAt - LXdA\ - -L^LXdt + iHX 



Ut 



XLdAl - XL'^dAt - -XL'^L - iXH 



Ut 



+ ul [L'^XLdt] Ut 



Jt 



{CL,Hmdt + jt{[L\X])dAt+jt{[X,L])dA 



(3.117) 



(3.118) 
(3.119) 



where we have used the fact that the fundamental noises are non-anticipative to pull 
them out of the jt terms and introduced the familiar Lindblad generator 



Cl,h[^] = L^^L - -L^LX - -XL^L + i[H,X]. 



(3.120) 



The form of Eq. (3.119) is pleasing, as it contains a deterministic part which depends 
on the familiar open quantum system Lindblad generator, plus quantum noise pieces 
which contain extra information regarding the field. Thus, if we were to take a partial 



^^It is a bit disingenuous to call this the Heisenberg evolution since Ut is really the 
interaction picture propagator. To return to the Heisenberg picture, we would need to 
rotate back by the free Hamiltonians. Often the initial atomic state is an eigenstate of 
the free system Hamiltonian Hs, so that this return rotation introduces canceling phases, 
indicating Ut already gives the Heisenberg evolution. However, this is not the case generally 
and will depend on the specifics of the system at hand. 
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trace of the field system, we would recover the standard Heisenberg picture master 
equation. 

Of course, if it were easy to measure the atomic system directly, we would be done 
at this point. We have a dynamical equation which describes the exact evolution of 
the atoms and field in the weak coupling limit and jt{X) is the corresponding evolved 
atomic operator to be measured. But lacking such direct atomitc measurements, 
we instead use the outgoing or scattered field as a probe of the atomic dynamics. 
In order to perform inference, we therefore need to fix the probe observable we 
intend to measure. The most common quantum optics measurements are photon 
counting, related to the process, and homodyne detection, related to the noise 
quadratures e~^'^At + e'''^A\. I will focus on the latter measurement and refer the 
reader to [Barchielli 2003; Bouten et al. 2007a] for more discussion on the topic. 

We still need to decide which quadrature to measure and should so by looking 
at the form of dUt- Comparing to our two level atom example, Ut appears to have 
come from a system-field coupling of the form i{L + L'^){a — a^) . That is, the atoms 
appear to couple to the p field quadrature. We therefore would not want to measure 
that quadrature of the field, since it commutes with the coupling Hamiltonian and 
therefore remains unchanged under evolution by Ut- Instead, we want to measure the 
orthogonal x quadrature, which will evolve non-trivially under Ut and carry off some 
information about its interaction with the atoms. We thus write the measurements 
as Yt = Uj{At + Al)Ut, which corresponds to the scattered x quadrature. We can 
again use the quantum Ito rules and some patience to calculate^^ 



dYt = dUj{At + Al)Ut + U}d {At + Al)Ut +dUjd {At + Al)Ut (3.121) 
= dU^{At + Al)Ut + Ul{dAt + dA\)Ut + f//(A + A\)dUt 



^^I don't list all the steps below because it is a useful exercise in the Ito rules to calculate 
these terms by hand. I will give the hint that all terms involving {At + AJ) are most easily 
treated together, as they simplify in a fashion similar to d{UjUt), which we know is zero if 
u}Ut = I. 
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+ Ul{dAt + dAl)dUt + dUJidAt + dAl)Ut + dUI{At + Al)dUt 

+ dU}{dAt + dAl)dUt (3.122) 

= jt{L + L'')dt + dAt + dAl (3.123) 

We see that the measurements are a noisy observation of jt{L + L^), albeit corrupted 
by the input x quadrature noise dAt + dAj. Recall that the p-quadrature drives 
the jt evolution, so that both non-commuting noises are somehow mixed up in the 
observations process. 

We would like to define the filtering problem as P[7t(X)|^o,t)]; the conditional 
expectation of the atomic state given the entire measurement record. However, as 
we saw in defining the quantum conditional expectation, we should be careful that 
this is actually a well-posed inference problem. The first question is whether the 
entire observations process generates a commutative algebra. If it did not, we would 
not be able to incrementally build up information over time, as future measurements 
could not be combined with past measurements. Fortunately, it is straightforward 
to check that the observations are commutative which is equivalent to satisfying the 
self-nondemolition property. As a start, note that 

U}(A, + Al)Ut = Ul(A, + Alps + J' UICl,h{^ + A^p.dr 

+ Ul[L\ {A + A^)]UrdT + j'^ Ul[{A + At), Lprdr (3-124) 

^Ul{As + A\)Us 

indicating = U}{As + Al)Ut for t > s. This is essentially a consequence of the 
Markov approximation implicit in the weak coupling limit. We see that A^+A^, which 
corresponds to the free fields at time s evolved under the free field Hamiltonian, only 
interacts with the atoms at time s and is then moves on. In other words, under the 
Markov approximation, the interaction occurs instantaneously, so that after s < t, 
Ut does nothing to the interaction picture operators Ag + Al. Using this fact, we can 
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readily verify 

= [U}{At + Al)Ut,U^{A, + Al)Ut] = U}[A + 4, A, + Alpt = 0. (3.125) 

Thus, ^o,t) — vN (y^ : < s < is a commutative von Neumann algebra which 
corresponds to a classical stochastic process via the spectral theorem. This fixes the 

commutative algebra ^ = ^^o,t) used to define the quantum conditional expectation. 
The only remaining check is that jt{X) is in its commutant, which is easily verified 
using the property just discussed, 

[jt{X), n] = [ulXUt, Ul{A, + Al)Ut] = 0. (3.126) 

We therefore have a well-defined inference problem which is summarized in the 
following definition. 

Definition 3.11. The filtering problem in quantum optics, defined on the quantum 
probability space (^ (8) ^) with state P = Pg (8) P,^, is to calculate 

7r,[X]=P(j,(X)|%,)) (3.127) 

for the system-observations pair 

djt{X) = jt{CL,Hm)dt + jti[L\ X])dAt + jtilX, L])dAl (3.128) 
dYt = jt(L + L^)dt + dAt + dAl (3.129) 



Before solving this problem in the following subsection, let us pause and refiect 
on what makes this different than the classical filtering problem. Classically, wc con- 
sidered the problem of estimating the state of a stochastically evolved system given 
observations corrupted by independent noise. Thus, there was some underlying sys- 
tem state to find and only technical reasons limited our observation of that state. In 
the quantum case, we have not added any extra corrupting noise; the limited obser- 
vations of the system are a consequence of the fundamental uncertainties in quantum 
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mechanics which arise from non-commuting observables, which here are the two field 
quadratures. Furthermore, there is no hidden or underlying state independent of 
the observations process, since measurement back-action non-trivially changes the 
state of the system. Fortunately, the structure of the filter is such that we can still 
estimate jt(X) using the observations process. 

3.3.2 Quantum Filtering Equation 

Again we will follow the classical reference probability approach taken in solving 
the non-linear filtering problem. Recall that the approach is to find a new measure 
under which the observations and state are statistically independent, so that the 
conditional expectation becomes trivial to evaluate. For us, this means finding a 
new state under which the quantum conditional expectation is much simpler, after 
which an application of the quantum Bayes rule in Theorem 3.5 allows us to relate 
this back to the original problem. All of these steps were taken in Example 3.5, so 
you may refer to that for another demonstration of what follows. 

Our first step is to find a quantum analogue of the Girsanov transformation, 
which here amounts to finding a state under which Yt is a Wiener process. Instead, 
it actually is more convenient to work with the input quadrature Zt = At + A\ 
directly, suggesting we work under a new state 



Thus, we move the time-evolution via Ut onto the states and work with operators in 
the free field interaction picture. From Example 3.5, we have ViUlXUtpl'^tUt) = 
U}Qt{X\'^t)Ut, where is the von Neumann algebra generated by Zt which is related 




(3.130) 



to the original observations algebra via ^o,t) = Ul'iotUt- Thus, we have 




(3.131) 
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The hope is that the conditional expectation may be calculated more easily under 
Qt, where the Ut evolution is part of the state, after which we reapply Ut to return 
to our original picture. The Girsanov analogy comes from noting that Zf is precisely 
a classical Wiener process under the original state P, so that applying the quantum 
Bayes rule to Eq. (3.131) would allow us to easily evaluate the Qt conditional expec- 
tation in terms of P conditional expectations. But just as in Example 3.5 we have 
the problem that our change of measure operator Ut is not in the commutant ^/ since 
Zt, which is the x field quadrature, does not commute with the p field quadrature 
which generates Ut- 

Fortunately, the vacuum reference state provides a nice means for finding a G 
^/ which nonetheless satisfies P{UjXUt) = PiV^XYt) for all atomic operators X. 
Such a Vj is governed by the QSDE 



dVt 



L{dAt + dA\) - '^L'^Ldt - iHdt 



Vt (3.132) 



which will give the same vacuum expectation as Ut since dAt\^) = 0. Clearly, Vt G ^/ 
since it is driven by the x quadrature noise Zt = At + aI. 

An application of the quantum Bayes formula in Theorem 3.5 gives the quantum 
Kallianpur-Striebel formula 

U;P{V*XV,\%)U, a,(X) 

where all the condition is on '^t, the algebra generated by the P- Wiener process 
At + AI. In short, the whole point of introducing Qj and Vt was to make the 
conditioned algebra, whose statistics we know given our understanding of the Wiener 
process. 

We now focus on deriving an SDE for ctj (X), which is done by explicit calculation. 
From the quantum Ito rules in integral form, we have 

Vt^XVt = X+ [ V,^CL,H[X]Vsds+ [ V^^iL^X + XL)Vsd{A, + Al). (3.134) 
Jo Jo 
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This is easily derived by noting that jt{X) in Eq. (3.119) is identical save for changing 
from Ut to Vt, which amounts to mapping —L'^dAt ^ +LdAt. We next evaluate the 
conditional expectations of each term in this expression, using the fact that the 
conditional expectation may be pulled inside the integrals to find 

^{y^xv\^t) = PW + f'P{y}CL,HmVs\'^s)ds 

Jo 

+ [ PiV^L^X + XL)V,\%)dZs (3.135) 
Jo 

Finally, we apply the Ito rules to U}P{Vi^ XV\'^t)Ut to find 

dat{X) = at{CL,H[X])dt + at{L^X + XL)dYt (3.136) 

where we have identified the observations process Yt = UjZtUt. In order to recover 
the normalized form, we again use the Ito rules as we did in solving the classical 
Kushner-Stratonovich equation, cf. Eq. (2.111), to arrive at the filter given in the 
following theorem. 

Theorem 3.7 (Quantum Filtering Equation). The solution to the quantum fil- 
tering problem satisfies the SDE 

rfvr^X] = T,t[CLAX]]dt+ {Tit[L^X + XL] - 7Tt[L^ + L]nt[X]) {dVt - nt[L + L^]dt) 

(3.137) 

with TTo[X] = Ps{X) = Tt[Xp] . 

This is precisely a recursive formula which may integrated on a classical computer 
by processing the observations process Yf. Comparing this to the classical non- 
linear filter in Eq. 2.112, we see the innovations process appearing as dYf — Trt[L + 
L'^jdt, which is again precisely a classical Wiener process. This provides a very nice 
interpretation of the resulting filter, in which the innovations drive the estimate by 
pulling out all the information from the measurement process which is not already in 
our estimate vr^fX]. This information, which includes the quantum noise dA^, + dAl 
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in addition to the true atomic state jt{X), is then used to condition our estimate of 
the atomic system in accordance with the expected back action which results from 
the field measurement. 

Just as we saw classically, the form of the filtering equation above is not always 
convenient, since it requires iterating TCt[X],']Tt[L + L"!"], 7rj[L"l'X], ■ ■ ■ until a closed 
system of equations is found. Instead, one often works with a state representation 
in terms of the conditional density matrix pt which satisfies vrt[X] = Tr[Xpt] for all 
atomic operators X. Plugging this into Eq. (3.137) gives the quantum filter in its 
adjoint form as 

dpt = -z[H,pt]dt+{LptL^-h^Lpt-^ptL^L)dt+{Lpt+ptL^-Ti[{L+L^)pt]pt)dWt 

(3.138) 

where we recognize the familiar Lindblad form for the deterministic pieces, charac- 
teristic of an open quantum system master equation (see [Walls and Milburn 2008] 
for more detail). The stochastic term, which is non-linear in pt, performs the con- 
ditioning via the innovations process which I have written as the Wiener process 
dWt = dYt — Tr [(L + L'^)pt\ dt. The adjoint form suggests a nice interpretation of the 
filter as a continuous measurement of the observable L + L"!". Indeed, one can show 
[Adler et al. 2001] that if if = 0, the steady-state of pt is precisely an eigenstate of 
L + L"!" and occurs with probability Tr[(L + L"'')po]- Thus, rather than considering 
an instantaneous projective measurement of L + L\ the measurement is extended 
in time and appears as a deterministically driven Wiener process, opening the door 
for performing feedback [Wiseman 1994] using the current filtered estimate. Such a 
possibility will be considered in Chapter 6. 

Before closing this section with an example, I do want to note that continuous 
measurement can be considered entirely within the generalized measurement and 
quantum operations framework of quantum information theory [Jacobs and Steck 
2006]. Although many of the mathematical subtleties are glossed over, I believe 
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there is also an interpretational issue which arises. Specifically, when working with 
the conditional density matrix formalism, the measurements are usually written as 

dYt = TT[{L + L^)pt]dt + dWt (3.139) 

where pt is the conditional density matrix and dWt is a Wiener process that arises 
from taking the central limit of many infinitesimal measurements. But as we saw 
above, this is not the measurement process, which actually contains the true system 
state jt{L + L^) and the quantum noise dAt + dAl. The innovations Wiener process 
dWt only arises by explicitly subtracting the current estimate from the measurements. 
That is, if we want to learn something about the system, the measurements better 
contain some information about it, rather than just our current estimate corrupted 
by noise. Philosophically, this amounts to deciding whether pt is the true state of 
the atoms or whether it is simply our estimate of the true state. I prefer the latter 
perspective, which allows for a careful consideration of the stability of the filter 
under incorrect initial state estimates [van Handel 2009]. But such a case is not 
uncommon, especially when the continuous measurement process is being used to 
measure an unknown initial state. 



Laser 




Figure 3.4: Continuous-measurement of single qubit precessing in an external 
magnetic field 

Example 3.6 (Qubit in a magnetic field). Consider the setup depicted in Figure 3.4. 
A qubit, initially in the pure state \+x), precesses about a magnetic field B while 
undergoing a continuous measurement along z. In terms of the general framework, 
H = Bay and L = ^JkOz-, where is the continuous measurement strength in the 
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Figure 3.5: (Bottom) Simulated typical measurement trajectory for continuous 
Z measurement, k = 1, S = (Top) Filtered values of 7rt[cri.] and T^t[(^z] for 
simulated trajectory 



weak coupling limit. We will not dwell on the underlying physical mechanism which 
gives rise to the o"^ measurement, though continuous polarimetry measurements could 
suffice Bouten et al. [2007b]. Plugging into Eq. (3.137), the quantum filter for the 
Bloch vector nj = (vr^crj, 7rj[o-j^], 7rj[crJ) is 

rf7rj[aj = 2B'Rt[oMt - 2/«7r4(Tjdt - 2^'Rt[o.]'i^t[(yMWt (3.140) 
dutloy] = -2M7rt[ay]dt - 2^/^nt[a^]nt[cry]dWt (3.141) 
dnt[a,] = -2Bnt[a,]dt + 2y^{l - nt[a,]^)dWt (3.142) 

with innovations dWt = dMt — 2^/^^^Tt[o^z]dt. It is not difficult to verify that the 
quantum filter maintains pure states and that the initial state uq = (1, 0, 0) remains 
on the Bloch circle in the x-z plane. Letting 6 be the angle from the positive x-axis 
such that tan^ = 7rt[o"2]/7rt[(Ta;], we then simplify the filter to 

d9t = -2Bdt + Ksin{2et)dt + 2^/^ cos{e t)dWt (3.143) 
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where now dWt = dMt — 2^/K, sin 6t. Figure 3.5 shows a computer simulation of a 
typical measurement trajectory and filtered Bloch vector values when B = 0. We see 
that the initial +x state indeed collapses to a +z eigenstate, which is then a fixed 
state of the continuous measurement. 



3.4 Summary 



Perhaps a yet unstated purpose of this chapter was to probe the distinction we tend to 
hold between what is quantum and what is classical in quantum information theory. 
We found that a commutative set of operators is well-described by a classical proba- 
bility model and that inference between commuting observables is readily performed 
in terms of the classical tools we developed in Chapter 2. Indeed, the quantum filter, 
which is capable of describing a continuous measurement of a quantum system as a 
stochastic process, is an entirely classical object. Moreover, the filter did not require 
using the standard projection postulate but instead recovers it in the long time, 
strong measurement strength limit. This perspective allows us to "look inside" the 
projective measurement, watch the wave function collapse and potentially modify it 
via feedback. In short, the traditional weirdness of quantum back-action arises natu- 
rally from the interplay of classical conditioning and quantum dynamics. The second 
goal was to again convince the reader that spending a little time familiarizing oneself 
with the mathematics of quantum probability theory, quantum stochastic calculus 
and quantum filtering provides a sophisticated yet simple approach to solving many 
problems in quantum optics. In fact, most of the original research in this thesis lever- 
ages the filtering formalism in this chapter, solving the problem of continuous-time 
quantum parameter estimation and studying quantum error correction via contin- 
uous measurement and feedback. In relating those results, I hope the reader will 
appreciate the groundwork they laid in this chapter and the ease with which classi- 
cal stochastic control and estimation methods are trivially adapted to the quantum 
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Chapter 4 

Quantum Parameter Estimation 



In this chapter, I extend the quantum filtering techniques of Chapter 3 to allow 
for the estimation of unknown parameters which drive the evolution of the system 
undergoing continuous measurement. By embedding parameter estimation in the 
standard quantum filtering formalism, we will find the optimal Bayesian filter for 
cases when the parameter takes on a finite range of values. For cases when the 
parameter is continuous valued, I develop quantum particle filters as a practical 
computational method for quantum parameter estimation. The techniques developed 
within this chapter were published in [Chase and Geremia 2009b]. 



4.1 Introduction 



Determining unknown values of parameters from noisy measurements is a ubiquitous 
problem in physics and engineering. In quantum mechanics, the single-parameter 
problem is posed as determining a coupling parameter that controls the evolution 
of a probe quantum system via a Hamiltonian of the form = C,Hq [Boixo et al. 
2007; Braunstein and Caves 1994; Braunstein et al. 1996; Giovannetti et al. 2004; 
2006; Helstrom 1976; Holevo 1982]. Traditionally, an estimation procedure proceeds 
by (i) preparing an ensemble of probe systems, either independently or jointly; (ii) 
evolving the ensemble under H^; (iii) measuring an appropriate observable in order 
to infer ^. The quantum Cramer-Rao bound [Braunstein and Caves 1994; Braunstein 
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et al. 1996; Cramer 1946; Helstrom 1976; Holevo 1982] gives the optimal sensitivity 
for any possible estimator and much research has focused on achieving this bound 
in practice, using entangled probe states and nonlinear probe Hamiltonians [Nagata 
et al. 2007; Pezze et al. 2007; Woolley et al. 2008]. 

Yet, it is often technically difficult to prepare the exotic states and Hamiltonians 
needed for improved sensitivity. Instead, an experiment is usually repeated many 
times to build up sufficient statistics for the estimator. In contrast, the burgeoning 
field of continuous quantum measurement [Bouten et al. 2009] provides an opportu- 
nity for on-line single-shot parameter estimation, in which an estimate is provided 
in near real-time using a measurement trajectory from a single probe system. Pa- 
rameter estimation via continuous measurement has been previously studied in the 
context of force estimation [Verstraete et al. 2001] and magnetometry [Geremia et al. 
2003]. Although Verstraete et. al develop a general framework for quantum param- 
eter estimation, both of [Geremia et al. 2003; Verstraete et al. 2001] focus on the 
readily tractable case when the dynamical equations are linear and the quantum 
states have Gaussian statistics. In this case, the optimal estimator is the quantum 
analog of the classical Kalman filter [Belavkin 1999; Kalman 1960; Kalman and Bucy 
1961], seen in example 2.5 in Chapter 2. 

In this chapter, I develop on-line estimators for continuous measurement when 
the dynamics and states are not restricted. Rather than focusing on fundamental 
quantum limits (which is the topic of Chapter 5), I instead consider the more basic 
problem of developing an actual parameter filter for use with continuous quantum 
measurements. By embedding parameter estimation in the standard quantum filter- 
ing formalism [Bouten et al. 2009], I construct the optimal Bayesian estimator for 
parameters drawn from a finite dimensional set. The resulting filter is a generalized 
form of one derived by Jacobs for binary state discrimination [Jacobs and Steck 2006] . 
Using recent stability results of van Handel [van Handel 2009] , I give a simple check 
for whether the estimator can successfully track to the true parameter value in an 
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asymptotic time limit. For cases when the parameter is continuous valued, I develop 
quantum particle filters as a practical computational method for quantum parameter 
estimation. These are analogous to, and inspired by, particle filtering methods that 
have had much success in classical filtering theory [Arulampalam et al. 2002; Doucet 
et al. 2001]. Although the quantum particle filter is necessarily sub-optimal, I present 
numerical simulations which suggest they perform well in practice. Throughout, I 
demonstrate the technqiues using a single qubit magnetometer. 



4.2 Estimation of a parameter from a finite set 

We begin by considering the case where the parameter takes on a known, finite 
set of values. Using the quantum filtering techniques in Chapter 3, we know that 
continuous measurements of the probe^ system that couples to the parameter are 
well-described by a quantum filter as in Eq. (3.137) with Hamiltonian 

H = ^Ho, Hoe^s. (4.1) 

Recall that is the space of system (atomic) operators as introduced in Chapter 3, 
which are distinct from the set of operators on the ancillary system (field) used 
to perform the continuous measurement. Although what follows applies for arbitrary 
systems which admit a continuous measurement description, we fix our language to 
that of atoms and fields for a more transparent discussion. 

Supposing we knew the true value of the parameter, the quantum filtering equa- 
tions would give us the best least-squares estimate of the atomic system conditioned 
on the measurements and the knowledge of dynamics induced by C, through H. But 
given the optimality of the filter, we could equally well embed the parameter as 

^Note that the word "probe" is used in regard to the system which couples to the 
parameter and is then used to infer the value of ^. This is in addition to the idea of using 
an ancillary system, such as the electromagnetic field, to perform continuous measurements 
on the probe system. 
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a diagonal operator S acting on an auxiliary quantum space, after which the filter 
still gives the best estimate of both system and auxiliary space operators. Find- 
ing the best estimate of C, conditioned on the measurements simply corresponds to 
integrating the equations for 7rt[S]. 

More precisely, extend the atomic Hilbert space Hs ^— > H^^Hs and the operator 
space ^ 2) (Tig) where S)(7ig) is the set of diagonal operators on H^. 
Assuming ^ takes on possible values {^i, . . . ,^n}, dimDiH^) = N. Introduce the 
diagonal operator 

N 

^{n^)^E = Y,m){^^\ (4.2) 

i=l 

so that = with G H^. This allows one to generalize Eq. (4.1) as 

H ^E(^Hoe^(n^)(^^s. (4.3) 

Any remaining atomic operators Xa € act as the identity on the auxiliary space, 
i.e. / Xa- Given these definitions, the derivation of the quantum filtering equa- 
tion remains essentially unchanged, so that the filter in either the operator form of 
Eq. (3.137) or the adjoint form of Eq. (3.138) is simply updated with the extended 
forms of operators given in the last paragraph. 

Since ^ is a classical parameter, we require that the reduced conditional density 
matrix {p^)t = T^^'^'Hs (Pt) be diagonal in the basis of H. Thus we can write 

N 

1=1 

where 

p« ^ Tr[(|e.)(e.| ® /)a] = 

= E[ie.)(e.i ® mm] = ^(^ = ^^\Mlo,t])■ (4.5) 

Then p[^^ is precisely the conditional probability for ^ to have the value and the set 
gives the discrete conditional distribution of the random variable represented 
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by S. Similarly, by requiring operators to be diagonal in Tig, we ensure that they 
correspond to classical random variables. In short, we have simply embedded filtering 
of a truly classical random variable in the quantum formalism. 

The fact that both states and operators are diagonal in the auxiliary space sug- 
gests using an ensemble form for filtering. As such, consider an ensemble consisting 
of a weighted set of conditional atomic states, each state evolved under a different 
^j. Later, in section 4.3, we will call each ensemble member a quantum particle. For 
now, we explicitly write the conditional quantum state as 

N 

pf = J2pP\^^)(^^\^Pt^ (4-6) 
1=1 

where pf'* is a density matrix on Hs- The reduced state, Tr-?^^ (pf ); is clearly diag- 
onal in the basis of H. Using the extended version of the adjoint quantum filter in 
Eq. (3.138), one can derive the ensemble quantum filtering equations 

dpf = -^mo,pf^]dt + {Lp?L^ - h^Lpf^ - \pt^ L)dt 

+ (lp« + p«Lt - Tr [(L + Lt)p«]p?) dW, (4.7a) 

dpf = {ti [{L + Lt)pf - Tr [/ ® (L + Lt)pf ] ) pfdWt (4.7b) 

dWt = dMt - Tr [/ ® (L + L^)pf] dt (4.7c) 

We see that each pf ^ in the ensemble evolves under a quantum filter with H = ^{Hq 
and is coupled to other ensemble members through the innovation factor dWt, which 
depends on the ensemble expectation of the measurement observable. Note that one 
can incorporate any prior knowledge of ^ in the weights of the initial distribution 

The reader should not be surprised that a similar approach would work for esti- 
mating more than one parameter at a time, such as three cartesian components of an 
applied magnetic field. One would introduce an auxiliary space for each parameter 
and extend the operators in the obvious way. The ensemble filter would then be 
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for a joint distribution over the multi-dimensional parameter space. Similarly, one 
could use this formalism to distinguish initial states, rather than parameters which 
couple via the Hamiltonian. For example, in the case of state discrimination, one 
would introduce an auxiliary space which labels the possible input states, but does 
not play any role in the dynamics. The filtered weights would then be the probabil- 
ities to have been given a particular initial state. In fact, using a slightly different 
derivation, Jacobs derived equations similar to Eq. (4.7) for the case of binary state 
discrimination [Jacobs and Steck 2006]. Yanagisawa recently studied the general 
problem of retrodiction or "smoothing" of quantum states [Yanagisawa 2007]. In 
light of his work and results in the following section, the retrodictive capabilities of 
quantum filtering are very limited without significant prior knowledge or feedback. 

4.2.1 Conditions for convergence 

Although introducing the auxiliary parameter space does not change the derivation 
of the quantum filter, it is not clear how the initial uncertainty in the parameter will 
impact the filter's ability to ultimately track to the correct value. Indeed, outside 
of anecdotal numerical evidence (which I will presently add to), there has been little 
formal consideration of the sensitivity of the quantum filter to the initial state esti- 
mate. Recently, van Handel presented a set of conditions which determine whether 
the quantum filter will asymptotically track to the correct state independently of 
the assumed initial state [van Handel 2009]. Since we have embedded parameter es- 
timation in the state estimation framework, such stability then determines whether 
the quantum filter can asymptotically track to the true parameter, i.e. whether 
lim^^ooPi"'^ = Sij when C, = C,i- In this section, I present van Handel's results in the 
context of our parameter estimation formalism and present a simple check of asymp- 
totic convergence of the parameter estimate. We begin by reviewing the notions of 
absolute continuity and observability. 
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In the general stability problem, let pi be the true underlying state and p2 be 
the initial filter estimate. We say that pi is absolutely continuous with respect to p2, 
written pi <^ p2, if and only if ker pi D ker p2- In the context of parameter estimation, 
we assume that we know the initial atomic state exactly, so that pi <^ p2 as long as 
the reduced states satisfy pf <^ pf. Since these reduced states are simply discrete 
probability distributions, {{pl)i} and {{pl)2}, this is just the standard definition 
of absolute continuity in classical probability theory as we saw in Chapter 2 when 
studying the Radon-Nikodym theorem 2.2. In our case, the true state has (Pt=o)i — 
6ij if the parameter has value ^j. Thus, as long as our estimate has non-zero weight 
on the i-th component, pi ^ p2- This is trivially satisfied if (^^=0)2 7^ ^ f*-*^ j- 

The other condition for asymptotic convergence is that of observability. A system 
is observable if one can determine the exact initial atomic state given the entire 
measurement record over the infinite time interval. Observability is then akin to 
the ability to distinguish any pair of initial states on the basis of the measurement 
statistics alone. Recall the definition of the Lindblad generator in Eq. (3.120) and 
further define the operator /C[X^] = L'^Xa + XaL. Then according to Proposition 5.7 
in [van Handel 2009], the observable space O is defined as the smallest linear subspace 
of containing the identity and which is invariant under the action of C and /C. 
The filter is observable if and only if ^ = O, or equivalently dim^ = dim (9. 

In the finite-dimensional case, van Handel presents an iterative procedure for 
constructing the observable space. Define the linear spaces 2„ C ^ as 

Zq = span{/} 

(4.8) 

Zn = span{2:„_i, C[Zn-i], IC[Zn-i]}, n> 
The procedure terminates when Zn = Zn+i, which is guaranteed for some finite 
n = m, as the dimension of Zn cannot exceed the dimension of the ambient space 
Moreover, the terminal Zm = O, so that using a Gram-Schmidt procedure, one 
can iteratively find a basis for O and easily compute its dimension. Note that for 
operators A and B, the inner-product {A, B) is the Hilbert- Schmidt inner product 
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Tt[A^B]. 

Given these definitions, one has the following theorem for filter convergence and 
corollary for parameter estimation. 

Theorem 4.1. (Theorem 2.5 in [van Handel 2009]) Let 7rf'(X^) he the evolved filter 
estimate, initialized under state pi. If the system is observable and pi <^ p2, the 
quantum filter is asymptotically stable in the sense that 

Ivrf (X^) - vrf {Xa)\m^.^^ VX„ G ^ (4.9) 

where the convergence is under the observations generated by pi. 

One could use this theorem to directly check the stability of the quantum filter 
for parameter estimation, using the extended forms of operators in C and K, and 
being careful that the observability condition is now dim (9 = dimP(?-^^) ® 
However, the following corollary relates the observability of the parameter filter to 
the observability of the related filter for a known parameter. Combined with the 
discussion of extending the absolute continuity condition, this then gives a simple 
check for the stability of the parameter filter. 

Corollary 4.1.1. Consider a parameter ^ which takes on one of N distinct positive 
real values {^i}. If the quantum filter with known parameter is observable, then the 
corresponding extended filter for estimation of ^ is observable. 

Proof. In order to satisfy the observability condition, we require dim (9 = Xr, where 
we have set dim^ = r and used the fact that dim 2) (7^^) = X. Given that the filter 
for a known parameter is observable, its observable space coincides with ^ and has 
an orthogonal operator basis {^i}, where we take Aq = I. 

Similarly, consider the X-dimensional operator space D{T-C^). If {^j} are distinct, 
any set of the form 

. . . , ki k, iii^j (4.10) 



Chapter 4. Quantum Parameter Estimation 



129 



is linearly independent, since the corresponding generalized Vandermonde matrix 



(4.11) 



y'^AT STv • • • SAT y 
has linearly independent columns [Gantmakher 2000]. 

Following the iterative procedure, we construct the observable space for the pa- 
rameter estimation filter starting with I ® Aq, which is the identity in the extended 
space. We then iteratively apply C and /C until we have an invariant linear span 
of operators. The only non-trivial operator on the auxiliary space comes from the 
Hamiltonian part of the Lindblad generator, which introduces higher and higher 
powers of the diagonal matrix H. Since dimS}(7ig) 0^ is finite, this procedure 
must terminate. The resulting observable space can be decomposed into subspaces 

Oi = {E''' ®A,}, « = l,...,r fc/eN (4.12) 

where is some increasing sequence of non-negative integers which correspond to 
the powers of S that are introduced via the Hamiltonian. Note that the specific 
values of kj depend on the commutator algebra of Hq and the atomic-space operator 
basis {Ai}. Regardless, since the Hamiltonian in C can always add more powers 
of S, the procedure will not terminate until Oi is composed of a largest linearly 
independent set of powers of S. This set has at most distinct powers of S, since 
it cannot exceed the dimension of the auxiliary space. Given that any collection of 
powers of H is linearly independent, this means once we reach a set of powers 
kj, the procedure terminates and dimOj = A^. Since O has r subspaces Oi, each of 
dimension A^, dim O = Nr as desired and the observability condition is satisfied. □ 



Although these conditions provide a simple check, I would like to stress that 
they do not determine how quickly the convergence occurs, which will depend on the 
specifics of the problem at hand. Additionally, as posed, the question of observability 
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is a binary one. One might expect that some unobservable systems are nonetheless 
"more observable" than others or simply that unobservable systems might still be 
useful for parameter estimation. Given the corollary above, one can see that this 
may occur if a single parameter C,j = 0. Then has a row of all zeros, so that the 
maximal dimension of a set of linearly independent powers of H is — 1. Similarly, if 
one allows both positive and negative real- valued parameters, the properties of are 
not as obvious, though in many circumstances, having both and —C,i renders the 
system unobservable. We explore these nuances in numerical simulations presented 
in the following section. 

Qubit Example 

Consider using the single qubit from Example 3.6 in Chapter 3 as a probe for the 
magnetic field B. Since the initial state is restricted to the x-z plane, the y component 
of the Bloch vector is always zero and thus is not a relevant part of the atomic 
observable space, which is spanned by {/, da;, cr^}. In other words, the filter with 
known B is trivially observable, since we assume the initial state is known precisely. 

When B is unknown, the ensemble parameter filter is given by 



where dWt = dMt - 2^{a-,)^^^ and {a,)^ = J^^pf sm{9P). We simulated this 
filter by numerically integrating the quantum filter in Eq. (3.143) using a value for 
B uniformly chosen from the given ensemble of potential B values. This generates 
a measurement current dMt, which is then fed into the ensemble filter of Eq. (4.13). 
For all simulations, I set k = 1 and used a simple Ito-Euler integrator as described 
in Appendix B with a step-size dt = 10^^. 




= -2B,dt + Kcos{9P){sm{9P) - 2 {(T,f)dt 
+ 2^/^cos{eP)dWt 



(4.13a) 




(4.13b) 
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Time in units of k Time in units of k 

Figure 4.1: (a) Filtered pf^ for B e {2^;, 5 k,8k, 12k}. The filter tracks to the 
true underlying value of B = 2k (b) Filtered pf^ for B G {—k,+k}. The filter 
does not track to B = +k with probability one, though it is the most probable 
parameter value. 

Figure 4.1(a) shows a simulation of a filter for the case B G {2k, 5k, 8k, 12k}. 
The filter was initialized with a uniform distribution, p^ '^ = 1/4. For the particular 
trajectory shown, the true value of B was 2k and we see that the filter successfully 
tracks to the correct B value. This is not surprising, given that the potential values 
of B are positive and distinct, thus satisfying the convergence corollary. It is also 
interesting to note that the filter quickly discounts the probabilities for 8k, 12k, which 
are far from the true value. Conversely, the filter initially favors the incorrect B = 5k 
value before honing in on the correct parameter value. 

In Figure 4.1(b), we see a simulation for the case of 5 G {+k, —k}, which does 
not satisfy the convergence corollary. In fact, using the iterative procedure, one finds 
the observable space is spanned by {/ (g> /, / ® cr^, B (g) cr^, B^ ® I,B'^ ® a^, B^ ® a^}. 
But since B = (o„°^), B^ = k^I so that only 3 of the 6 operators are linearly 
independent. Although the filter does not converge to the true underlying value of 
B = +K, it does reach a steady-state that weights the true value of B more heavily. 
Simulating 100 different trajectories for the filter, there were 81 trials for which the 
final probabilities were weighted more heavily towards the true value of B. This 
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confirms our intuition that the binary question of observability does not entirely 
characterize the performance of the parameter filter. 

1 
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Figure 4.2: Rate of convergence (/0.95), averaged over 1000 trajectories. The 
filters are for cases when possible B values are either all larger or all smaller than 
the measurement strength k. 

Figure 4.2 shows the rate of convergence of filters meant to distinguish different 
sets of B. The rate of convergence is defined as the ensemble average of the random 
variable 

{1, if pi^^ > a for any i 
(4.14) 
0, otherwise 

Although any individual run might fluctuate before converging to the underlying 
B value, the average of over many runs should give some sense of the rate at 
which these fluctuations die down. For the simulation shown, I set a = 0.95 and 
averaged J0.95 over 1000 runs for two different cases — either all possible B values 
are greater than k or all are less than k. As shown in the plot, the former case 
shows faster convergence since the B fleld drives the dynamics more strongly than 
the measurement process, which in turn makes the trajectories of different ensemble 
members more distinct. Of course, one cannot make the measurement strength too 
weak since we need to learn about the system evolution. Therefore care must be taken 
to tune the signal-to-noise ratio in the problem at hand, relative to the timescales 
relevant for the parameter values of interest. 
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4.3 Quantum Particle Filter 

Abstractly, developing a parameter estimator in the continuous case is not very 
different than in the finite dimensional case. One can still introduce an auxiliary 
space Ti^, which is now infinite dimensional. In this space, we embed the operator 
version of C, as 

^{n^)3E = j (4.15) 

where = ,^|,^) and (^|^') = Again, by extending operators appropriately, 

the filters in Eq. (3.137) and Eq. (3.138) become optimal parameter estimation filters. 
We generalize the conditional ensemble state of Eq. (4.6) to 

pf = j dipm\i){i\®pf\ (4.16) 

where Pt{i) = -P(^|^[o,i]) is the continuous conditional probability density. Although 
the quantum filter provides an exact formula for the evolution of this density, calcu- 
lating it is impractical, as one cannot exactly represent the continuous distribution 
on a computer. The obvious approximation is to discretize the space of parameter 
values and then use the ensemble filter determined by Eq. (4.7); indeed such an ap- 
proach is very common in classical filtering theory and encompasses a broad set of 
Monte Carlo methods called particle filters [Arulampalam et al. 2002; Doucet et al. 
2001]. 

The inspiration for particle filtering comes from noting that any distribution can 
be approximated by a weighted set of point masses or particles. In the quantum 
case, we introduce a quantum particle approximation of the conditional density in 
Eq. (4.16) as 

N 
i=l 

The approximation can be made arbitrarily accurate in the limit of ^ oo. Plug- 
ging this into Eq. (4.16), we recover precisely the form for the discrete conditional 
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state given in Eq. (4.6). Accordingly, the quantum particle filtering equations are 
identical to those of the ensemble filter given in Eq. (4.7). The only distinction here 
is in the initial approximation of the space of parameter values. Thus the basic quan- 
tum particle filter simply involves discretizing the parameter space, then integrating 
the filter according to the ensemble filtering equations. 

The basic particle filter suffers from a degeneracy problem, in that all but a few 
particles may end up with negligible weights p[^\ This problem is even more relevant 
when performing parameter estimation, since the set of possible values for C, are fixed 
at the outset by the choice of discretization. Even if a region in parameter space has 
low weights, its particles take up computational resources, but contribute little to the 
estimate of ^. More importantly, the ultimate precision of the parameter estimate is 
inherently limited by the initial discretization; we can never have a particle whose 
parameter value is any closer to the true value ^ than the closest initial discretized 
value. 

In order to circumvent these issues, we can adopt the kernel resampling techniques 
of Liu and West [Liu and West 2001]. The idea is to replace low weight particles with 
new ones concentrated in high weight regions of parameter space. One first samples 
a source particle from the discrete distribution given by the weights {pf^}, ensuring 
new particles come from more probable regions of parameter space. Given a source 
particle, we then create a child particle by sampling from a Gaussian kernel centered 
near the source particle. By repeating this procedure times, we create a new set 
of particles which populate more probable regions of parameter space. Over time, 
this adaptive procedure allows the filter to move away from unimportant regions of 
parameter space and more finely explore the most probable parameter values. 

The details of the adaptive filter lie in parameterizing and sampling from the 
Gaussian kernel. Essentially, we are given a source particle, characterized by 
and and using the kernel, create a child particle, characterized by and 
p[^\ One could attempt to sample from a multi-dimensional Gaussian over both 
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the parameter and atomic state components, but ensuring that the sampled is a 
vahd atomic state would be non-trivial in general. There will be some cases, including 
the qubit example in the following section, where the atomic state is conveniently 
parameterized for Gaussian resampling. But for clarity in presenting the general 
filter, we will create a child particle with the same atomic state as the parent particle. 

Under this assumption, the Gaussian kernel for parent particle i is characterized 
by a mean /i*^*-' and variance a'^^^\ both defined over the one dimensional parameter 
space. Rather than setting the mean of this kernel to the parameter value of the 
parent, Liu and West suggest setting 

/i^*^ = a^i + (1 - a)e, aG[0,l] (4.18) 

where ^ = YliPf^^i ensemble mean. The parameter a is generally taken to be 

close to one and serves as a mean reverting factor. This is important because simply 
resampling from Gaussians centered at C,i results in an overly dispersed ensemble 
relative to the parent ensemble. The kernel variance is set to 

a^^'^ = eVt, he [0,1] (4.19) 

where Vt = YliPf\^i~0'^ is the ensemble variance and h is the smoothing parameter. 
It is generally a small number chosen to scale with A^, so as to control how much kernel 
sampling explores parameter space. While a and h can be chosen independently, Liu 
and West relate them hj h"^ = 1 — a^, so that the new sample does not have an 
increased variance. 

Of course, it would be computationally inefficient to perform this resampling 
strategy at every timestep, especially since there will be many steps where most 
particles have non-negligible contributions to the parameter estimate. Instead, we 
should only resample if some un desired level of degeneracy is reached. As discussed 
by Arulampalam et al. [Arulampalam et al. 2002] , one measure of degeneracy is the 
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effective sample size 

^efl = ^ ■ (4.20) 

At each timestep, we then resample if the ratio N^s/N is below some given threshold. 
We are not aware of an optimal threshold to chose in general, but the literature 
suggest 2/3 ClS db rule of thumb [Doucet et al. 2001]. 

Altogether, the resampling quantum particle filter algorithm proceeds as follows: 

Initialization for z = 1, . . . , A^: 

1. Sample from the prior parameter distribution. 

2. Create a quantum particle with weight pf^ = 1/A^, parameter state 
and atomic state p^^ = po, where po is the known initial atomic state. 

Repeat for all time: 

1. Update the particle ensemble by integrating a timestep of the filter given 
in Eq. (4.7). 

2. If Ncs/N is less than the target threshold, create a new particle ensemble: 
Resample for i = 1, . . . , A^: 

(a) Sample an index i from the discrete density {pf^}. 

(b) Sample a new parameter value C,i from the Gaussian kernel with 
mean /i^ and variance cr^'''^ given by Eq. (4.18) and Eq. (4.19). 

(c) Add a quantum particle to the new ensemble with weight = 
1/A^, parameter state and atomic state pf'* = p[^^ 



Unfortunately, checking asymptotic convergence of the filter is more involved in 
the continuous-valued the observability and absolute continuity conditions 

require extra care in infinite dimensions. However, given that the quantum particle 
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filter actually works on a discretized space, in practice we can simply use the results 
we had for the finite-dimensional case. As before, we note that one can generalize the 
quantum particle filter to multidimensional parameters by using a multi-dimensional 
Gaussian kernel. One might also consider using alternate kernel forms, such as a 
regular grid which has increasingly finer resolution with each resampling stage. We 
will not consider such extensions here. 

Qubit Example 

We now consider a resampling quantum particle filter for the qubit magnetometer 
introduced earlier in this chapter. As hinted at in the previous section, since the qubit 
state is parameterized by the continuous variable Of, we can easily resample both the 
magnetic field Bi and state using a two-dimensional Gaussian kernel for (Bi, 0^*^), 
with mean vector and covariance matrix given by generalizations of Eq. (4.18) and 
Eq. (4.19). Since different values of B result in different state evolutions, resampling 
both the state and magnetic field values should result in child particles that are closer 
to the true evolved state. 

Figure 4.3 shows a typical run of the quantum particle filter for N = 1000 par- 
ticles. The true B value was and the prior distribution over B was taken to be 
uniform over the interval [0,10k]. As before, I used an Ito-Euler integrator with a 
step-size of dt = 10~^k. Note that both the timespan of integration and the poten- 
tial values of B range from to 10k in our units. The resampling parameters were 
a = 0.98, h = 10"^ and resampling threshold 2/3. Note that I chose not to use Liu 
and West's relation between a and h. 

In order to generate the figure, each particle's weight and parameter values were 
stored at 50 equally spaced times over the integration timespan. Using Matlab's 
ksdensity function, these samples were then used to reconstruct Pt{B) via a Gaus- 
sian kernel density estimate of the distribution. The resulting kernel density esti- 
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Figure 4.3: Kernel density reconstruction of pt{B)dB = P{B\M[o^t])dB for 
N = 1000 particle filter set with dB = IOk/150, a = 0.98, h = 10~^ and 
resampling threshold of 2/3. The true magnetic field was B = 5k. 

mate was then evaluated at 150 equally spaced B values in the range [0, 10k], which 
I plotted as pt{B)dB with dB = IOk/ISO. As is seen in the figure, after some ini- 
tial multi-modal distributions over parameter space, the filter hones in on the true 
value of i? = 5k. For the simulation shown, the final estimate was B = 5.03/t with 
uncertainty cr^ = 0.18fi;. The filter resampled 7 times over the course of integration. 



I have presented practical methods for single-shot parameter estimation via contin- 
uous quantum measurement. By embedding the parameter estimation problem in 
the standard quantum filtering problem, the optimal parameter filter is given by an 
extended form of the standard quantum filtering equation. For parameters taking 
values in a finite set, I gave conditions for determining whether the parameter fil- 
ter will asymptotically converge to the correct value. For parameters taking values 
from an infinite set, I introduced the quantum particle filter as a computational tool 
for suboptimal estimation. Throughout, I presented numerical simulations of the 



4.4 Summary 
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methods using a single qubit magnetometer. 

These techniques should generalize straightforwardly for estimating time-dependent 
parameters and to a lesser extent, estimating initial state parameters. The binary 
state discrimination problem studied by [Jacobs and Steck 2006] is one such exam- 
ple and his approach is essentially a special case of our ensemble parameter filter. 
Future extensions of this work include exploring alternate resampling techniques for 
the quantum particle filter, considering alternative discretization schemes beyond 
the delta function particle basis and developing feedback strategies for improving 
the parameter estimate. 
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Chapter 5 



Precision Magnetometry 



In this chapter, I review the apphcation of the parameter estimation techniques de- 
veloped in Chapter 4 for a proposed experimental demonstration of precision magne- 
tometry. By double-passing an optical field through an atomic system, one hopes to 
create effective nonlinear interactions which offer improved sensitivity to the strength 
of an external magnetic field. Using quantum stochastic formalsim of Chapter 3, I 
review simulations of quantum information theoretic bounds on the optimal estima- 
tor performance which suggest magnetic field uncertainty scalings better than that of 
traditional atomic magnetometers, which is further supported by simulations of cor- 
responding quantum particle filed parameter estimators. The research in this chapter 
appears in [Chase et al. 2009a; Chase and Geremia 2009a; Chase et al. 2009b]. 



5.1 Introduction 



It is well-appreciated in physics that the properties of a field must often be deter- 
mined indirectly, such as by observing the effect of the field on a test particle. Take 
magnetometry for example: the strength of a magnetic field might be inferred by 
observing Larmor precession in a spin-polarized atomic sample Budker et al. [2002] 
and estimating the field strength B from the precession rate. Inherent in this pro- 
cess is the fact that the atomic spin must be measured to determine the extent of 
the magnetically-induced dynamics. For very precise measurements, uncertainty 5B 
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in the estimated value B of the field is dominated by quantum fiuctuations in the 
observations performed on the atomic sample. The results presented here fall under 
the umbrella of quantum parameter estimation theory Braunstein and Caves [1994]; 
Helstrom [1976], where the objective is to work within the rules of quantum mechan- 
ics to minimize, as much as possible, the propagation of this quantum uncertainty 
into the determination of metrological quantities, like B. 

Given, for instance, a y-axis magnetic field B = B y , an atomic sample couples 
to B via the magnetic dipole Hamiltonian 



where 7 is the atomic gyromagnetic ratio and Fi = X]j=i fi = ^5 ^) the 
collective spin operators obtained from a symmetric sum over N identical spin-/ 
atoms. If the atoms are initially polarized along the x-axis, the Larmor dynamics 
and thus B can be inferred by observing the z-component of the atomic spin 
Budker et al. [2002]; Geremia et al. [2003]; Kominis et al. [2003]. 

Through the quantum Cramer- Rao inequality Braunstein and Caves [1994]; Braun- 
stein et al. [1996]; Helstrom [1976]; Holevo [1982], it is possible to place an information- 
theoretic lower bound on the units-corrected mean-square deviation of the estimate 
B from B, 



The behavior of the estimator uncertainty with the number of atoms depends 
on the characteristics (e.g., separable, entangled, etc.) of the quantum states used 
to compute the expectation value in Eq. (5.2) as well as the nature of the induced 
dynamics Boixo et al. [2007]. If one does not permit quantum entanglement between 
the different atoms in the probe, it can be shown that the optimal parameter resolu- 
tion obtained from Eq. (5.1) is given by the so-called shotnoise uncertainty Budker 



H = -h-fBF^ 



(5.1) 




(5.2) 
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et al. [2002]; Geremia et al. [2003] 

5BsN{t) = (5.3) 



whose characteristic l/y/N scahng is a byproduct of the projection noise (AF^) = 
for a spin coherent state Wineland et al. [1994] (here F = f'N for a sample 
of atoms each with total spin quantum number /). It was believed for some time 
that the fundamental limit to parameter estimation, even when exploiting arbitrary 
entanglement between atoms in the probe, offers only a quadratic improvement 

SBudt) = (5.4) 

up to an implementation-dependent constant a. Eq. (5.4) has traditionally been 
called the Heisenberg uncertainty scaling, and it can be achieved in principle for 
various spin resonance metrology problems Wineland et al. [1994], including mag- 
netometry Geremia et al. [2003]. For an ensemble of N spin-1/2 particles prepared 
into the initial cat-state (|tT " " " T) + lii " " ■ i))/v^) the uncertainty scaling is given 
by l/'jtN and is sometimes called the Heisenberg Limit. 

Recently, however, it was shown that 1/A^ scaling can be surpassed Boixo et al. 
[2007] by extending the linear coupling that underlies Eq. (5.1) to allow for multi- 
body collective interactions Boixo et al. [2007]; Rey et al. [2007]. Were one to engineer 
a probe Hamiltonian where B multiplies fc-body probe operators, such as F^, then 
the quantum Cramer-Rao bound Braunstein and Caves [1994] indicates that the 
optimal estimation uncertainty would scale more favorably as ABk ~ l/N'^ Boixo 
et al. [2007]. Unfortunately, metrological coupling Hamiltonians are rarely up to 
us — they come from nature, like the Zeeman interaction — suggesting that one is 
stuck with a given uncertainty scaling without changing the fundamental structure 
of Eq. (5.1). Furthermore, it was shown in Ref. Boixo et al. [2007] that the addition 
of an auxiliary parameter-independent Hamiltonian Hi{t) such that 

H = -h-fBFy + Hi{t) (5.5) 
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does not change the scahng of the parameter uncertainty for any choice of Hi{t). 

At the same time, however, it should be well-appreciated that the dynamics one 
encounters in any actual physical setting are effective dynamics. Indeed, even the 
hyperfine Zeeman Hamiltonian Eq. (5.1) is an effective description at some level. This 
begs the question as to whether one can utilize an auxiliary system to induce effective 
dynamics that improve the uncertainty scaling in quantum parameter estimation by 
going outside the structure of Eq. (5.5). The purpose of this paper is to provide some 
direct evidence that doing so is possible. 

In particular, we will study effective nonlinear couplings generated by double- 
passing an optical field through an atomic sample (q.v. Figure 5.1) Sarma et al. 
[2008]; Sherson and M0lmer [2006]. Continuous measurement of the scattered field 
then allows for the estimation of and by extension, the magnetic field. Building 
on the quantum stochastic calculus approach in Sarma et al. [2008], I present the 
quantum filtering equations for estimating the state of the atomic sample. Although 
the effective dynamics are no longer described by a Hamiltonian, numerical calcula- 
tions of the quantum Fisher information can be used to obtain a theoretical lower 
bound on the uncertainty scaling of an optimal magnetic field estimator Braunstein 
and Caves [1994]. Such simulations suggest that for certain parameter regimes, the 
double-pass system's sensitivity to magnetic fields scales better than that of a com- 
parable single-pass system and what would be computed by applying the methods 
of Ref. Boixo et al. [2007] to Eq. (5.5). Other simulations suggest that the quantum 
Heisenberg limit may be attained without generating any appreciable entanglement. 
I also review direct simulations of magnetic field estimation for the system using 
quantum particle filters as further evidence for the improved uncertainty scaling 
provided by our proposed magnetometer. 

Unfortunately the results are somewhat muted by the fact that despite our best 
efforts, we have not found a parameter estimator whose uncertainty scaling can be 
shown analytically to outperform the conventional Heisenberg limit. In particular. 
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Figure 5.1: Schematic of a broadband atomic magnetometer based on con- 
tinuous observation of a polarized optical probe field double-passed through the 
atomic sample. 



I show that improved scaling is not achieved by a quantum Kalman filter Belavkin 
[1999]; Kalman [I960]; Kalman and Bucy [1961], as such a filter is only suitable 
for estimating magnetic fields in the linear small-angle regime and where the state is 
Gaussian and the dynamics are well approximated by a low order Holstein-Primakoff 
expansion Geremia et al. [2003]; Holstein and Primakoff [1940]. Although Kalman 
filters have had success in describing the single-pass system Geremia et al. [2003], 
simulations suggest the Gaussian and small-angle approximations break down pre- 
cisely when exact simulations of the double-pass system show improved sensitivity. 
For pedagogical purposes, I detail the derivation of such linear-Gaussian filters using 
the method of projection filtering Mabuchi [2008] ; van Handel and Mabuchi [2005b] . 
Doing so allows us to observe directly the limitations that arise when imposing the 
small- angle and Gaussian assumptions, and it also provides a framework for the 
future development of more sophisticated filters. 
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5.2 Continuous measurement of the double-pass 
system 

Consider the schematic in Fig. 5.1. The objective of this apparatus is to estimate 
the strength of a magnetic field oriented along the laboratory y-axis by observing 
the effect of that field on the spin state of the atomic sample. Like most atomic 
magnetometer configurations, our procedure relies upon Larmor precession and uses a 
far-detuned laser probe to observe the spin angular momentum of the atomic sample. 
Unlike conventional atomic magnetometer configurations, however, the probe laser 
is routed in such a way that it passes through the atomic sample twice prior to 
detection Sarma et al. [2008]; Sherson and M0lmer [2006]. 

Qualitatively, the magnetometer operates as follows. The incoming probe field 
propagates initially along the atomic 2-axis and is linearly polarized. As a result of 
the atomic polarizability of the atoms, the probe laser polarization acquires a Faraday 
rotation proportional to the z-component of the collective atomic spin. Two folding 
mirrors are then used to direct the forward scattered probe field to pass through the 
atomic sample a second time, now propagating along the atomic y-axis. Prior to 
its second interaction with the atoms, polarization optics convert the initial Faraday 
rotation into ellipticity. Thus on the second pass, the atoms perceive the optical 
helicity as a fictitious y-axis magnetic field acting in addition to the real field B, 
providing a positive feedback effect modulated by the strength of B. The twice 
forward-scattered optical field is then measured in such a way that is sensitive only 
to the Faraday rotation induced by the first pass atom-field interaction. 

5.2.1 Quantum Stochastic Model 

When the collective spin angular momentum of a multilevel atomic system interacts 
dispersively with a traveling wave laser field with wavevector k, the atomic spin 
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couples to the two polarization modes of the electromagnetic field transverse to k. 
These polarization modes can can be viewed as a Schwinger-Bose field that when 
quantized in terms of a plane-wave mode decomposition yields the familiar Stokes 
operators: 



2 

+ K,^^,<^-^x,>x,..) (5-7) 



2 



2 \'^+,a;^-,a; ^~,uj^+,uj 

i 
2 

Here, we have expressed the Stokes operators in terms of the Schrodinger-picture field 
annihilation operators, a^^ and Sy ^, for the plane- wave modes with frequency u and 
linear polarization along the x- and y-axes, respectively, as well as their corresponding 
transformations into the spherical polarization basis. 

In developing a physical model for the atom-field interaction in Fig. 5.1, it is 
convenient to transform from a plane-wave mode decomposition of the electromagetic 
field to operators that are labeled by time. Towards this end, we define the time- 
domain Schwinger boson annihilation operator as the operator distribution 

st = -j g{u)al^d^^^e^-'du;, (5.10) 

where g{uj) is a form factor. This definition permits us to express the Stokes operators 
as 

s^,t = i (^sl - and Sy,i = - (^s^ + 4) , (5.11) 
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which should be reminiscent of quadrature operators and also places the field opera- 
tors in a form that is directly in line with the standard nomenclature adopted in the 
field of quantum stochastic calculus. 

With a suitable orientation of the polarization optics (A/2 and A/4) in Fig. 5.1, 
the interaction Hamiltonians for each pass of the probe light through the sample are 
then 



respectively. Note that in developing these Hamiltonians, which are of the standard 
atomic polarizability form, it was assumed that rank-two spherical tensor interactions 
Geremia et al. [2006]; Smith et al. [2004] can be neglected. In practice, the validity 
of such an assumption can depend heavily on the choice of atomic level structure 
and experimental parameters such as the intensity and detuning of the probe laser 
field. 

In addition to specifying the Hamiltonians for the two atom-field interactions, it 
is also necessary to stipulate the measurement to be performed on the probe laser. 
Since we expect that the amount of Larmor precession (possibly augmented by the 
addition of the double-passed probe field) will cary information about the magnetic 
field strength 5, we must choose the measured field operator zt appropriately. Since 
the magnetic field drives rotations about the atomic y-axis, it is the 2;-component 
of the atomic spin that indicate such a rotation. From the form of the first-pass 
interaction Hamiltonian h[^\ we see that the 2;-component of the atomic spin couples 
to dynamics generated by the field operator s^^t = i{s\ ^ ^t)- The affect of such a 
coupling is then observed by measuring the orthogonal quadrature, indicating that 
the appropriate polarization measurement should be Zt = Sy^f 




(5.12) 



(5.13) 
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The Stochastic Propagator and Quantum Filter 

Analyzing the two individual interactions Hi and H2 via the stochastic limit studied 
in Chapter 3 gives rise to the following quantum stochastic differential equations 
(QSDE) for the interaction-picture propagators 

rfL/f ) = S^y/^F,{dSl - dSt) - ^mF^dt - ^Hdt^ Ul^^ (5.14) 

rft/f ^ = i^VkFy{dSt + dSl) - hp^dt - ^ (5.15) 

where m and k are the weak-coupling interaction strengths obtained from the rates 
/i and K, H is an arbitrary atomic Hamiltonian and dSj and dSt are delta-correlated 
noise operators derived from the quantum Brownian motion 

St = Sudu. (5.16) 



The noise terms satisfy the quantum Ito rules: dSfdSj = dt and dSjdSt = dSf = 
(dSl)^ = 0, and can be viewed heuristically as a consequence of vacuum fluctuations 
in the probe field. 

To obtain a single weak-coupling limit for the double-pass interaction, we combine 
the separate equations of motion for the two propagators into a single weak-couping 
limit as follows. First, write the two single-pass evolutions in terms of the generators 
of the dynamics 

dUl^^ = dtUl^\ and, dUi^^ = btUi^'> (5.17) 
and then expand the differential dUt of the combined propagator 

dUt+5t = {i + bt){i + dt)Ut (5.18) 
= Ut+ (d + b + bd)Ut (5.19) 



such that the combined propagator dUt = Ut+st — Ut then satisfies 

dU^= {d + b + bd) Ut . (5.20) 
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After evaluating the combined evolution for the propagators in Eqs. (5.14) and (5.15) 
in light of the quantum Ito rules, we find that the single weak-coupling limit propa- 
gator satisfies 



„^ 1- 1- 2?- 

WkmF^FJt - -mF^dt - -kF'^dt Hdt 

^ 2 " 2 y h 



+^F,{dSl - dSt) + i^Fy{dS\ + dSt)] Uf (5.21) 



Observe that as a result of the manner in which the combined weak-coupling limit 
was taken, the Hamiltonian term has the property that rates which appear in it 
differ by a factor of two from those that would be expected from a single weak- 
coupling limit. This factor of two is essentially the rescaling of time units that arises 
from aggregating two sequential weak-coupling limits as a single differential process. 
To retain consistency with the original definition of the frequencies that appear in 
the parameter-coupling Hamiltonian, it is essential to rescale time units such that 
frequencies in the parameter-coupling Hamiltonian are as expected. Doing so is 
accomplished by reversing the effective 2dt dt transformation that occurred in the 
derivation, and thus dividing all rates by two to give 



dU, 



iVKMFyFJt - ^MF^dt - ^KF^dt - ^-Hdt 



+VMF,{dSl-dSt) + iVKFy{dSl + dSt) Ut (5.22) 



where M = m/2 and K = k/2. 1 note that this final result agrees with the propagator 
obtained by Sarma et. al Sarma et al. [2008] who also derived the quantum stochastic 
propagator of this system in order to characterize the generation of polarization and 
spin squeezing as suggested by Sherson and M0lmer Sherson and M0lmer [2006]. 

Following the derivation of the quantum filter in Chapter 3, we recognize the 
dipole operator L = ^/MF^ + i^/KFy and Hamiltonian H = -^BFy - VKM {F^Fy + 
FyFz)/2 in comparing the double pass propagator of Eq. (5.22) to the general form 
of Eq. (3.116). Plugging in these forms into the adjoint filter of Eq. (3.138) yields 
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the double-pass quantum filter 

dpt = t^B[Fy,pt]dt + ty/KM[Fy,{F,,pt}]dt 

+MV[F^]ptdt + KV[Fy]ptdt (5.23) 
^M[F,]pt + tVK[Fy, ptl) dWt 



where the innovations process 

dWt = dZt - 2 Tr [F^pt] dt (5.24) 

is a Wiener process, i.e. E[(ilVj] = 0, dW^ = dt. The various superoperators are 
defined as 

V[Fk]pt = F,p,Fl - hlFkPt - \ptFlF^ (5.25) 

M [F^]pt = F,pt + ptF^ - 2 Tr [F,pt] pt (5.26) 

{F,,pt} = F,pt + ptF^ (5.27) 

One other form which is useful when the quantum state remains pure is the 
stochastic Schrodinger equation (SSE). As developed in Appendix C, the SSE for 
the double-pass quantum filter is 



d\i^)t = (^t^BFy-'-^iF^-^F^jy (5.28) 



M 



+iVKMFy{F, + (f,)^) - ^F^^ \ij)tdt 
+ (yM{F, - (^F,^) + tVKFy^ \tp)tdWt. 



5.3 The Quantum Cramer- Rao Inequality 

In order to characterize the performance of the magnetometer, we may consider 
quantum information theoretic bounds on the units-corrected mean-square deviation 
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of the magnetic field estimate B of the true magnetic field B Braunstein and Caves 
[1994]; Braunstein et al. [1996], given in Eq. (5.2). The quantum Cramer- Rao bound 
Braunstein and Caves [1994]; Braunstein et al. [1996]; Helstrom [1976]; Holevo [1982] 
states that the deviation of any estimator is constrained by 

5B>-L= X5(t) = Tr[p5(t)£|(t)], (5.29) 
y-LsKt) 

where the quantum Fisher information Tsit) is the expectation of the square of the 
symmetric logarithmic derivative operator, defined implicitly as 

= \{SlB{t)pB{t) + PBmB{t)). (5.30) 

For pure states, p% = pb, so that 

= 2?lgl (5.31) 
which indicates 

In this form, we see that the lower bound is related to the sensitivity of the evolved 
state to the magnetic field parameter. That is, any estimator's performance is con- 
strained by how well the dynamics transform differences in the value of B into dif- 
ferences in Hilbert space. 

As discussed by Boixo et. al in Boixo et al. [2007], for Hamiltonian evolution, the 
quantum Cramer-Rao bound may be expressed in terms of the operator semi-norm, 
which is the difference between the largest and smallest (non-degenerate) eigenvalues 
of the probe Hamiltonian. For the magnetic dipole Hamiltonian in Eq. (5.1), this 
bound is simply the Heisenberg limit in Eq. (5.4). More generally, the authors show 
that a probe Hamiltonian which involves fc-body operators gives rise to an uncertainty 
scaling of 1/tF^. They further argue that no ancillary quantum systems or auxiliary 
Hamiltonians contribute to this bound; it is determined solely by the Hamiltonian 
that directly involves the parameter of interest. 
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Such analysis suggests the double-pass quantum system, whose only direct mag- 
netic field coupling is in the magnetic dipole Hamiltonian, should show no more 
sensitivity than a single pass system. There are several reasons why one might be- 
lieve there is more to the story. Firstly, the unitary evolution of the joint atom-field 
system in Eq. (5.22) involves an auxiliary system of infinite dimension. As such, 
it is not clear that the arguments leading to the operator semi-norm are valid, in 
particular due to the fact that the white noise terms dSt, dS\ are singular. Addition- 
ally, the double-pass limit is a Markov one, in which the interaction the light field 
mediates between atoms is essentially instantaneous relative to other time-scales in 
the problem. The effective interaction is therefore fundamentally different than one 
in which measurements of a finite dimensional ancilla system are used to modulate 
the evolution of the probe atoms. Thus the conditioned system, given in terms of 
the quantum filter of Eq. (5.28), does not correspond to unitary dynamics. Indeed, 
looking at Eq. (5.28), we see that the local generator of dynamics is path- dependent, 
given in terms of the expectation of F^. Therefore, as the magnetic field directly 
impacts the state through the magnetic dipole term, it also non-trivially modulates 
future dynamics through a state-dependent generator. 

5.3.1 Numerical Analysis of the Quantum Fisher Informa- 
tion 

Unfortunately, it is not clear how to fold the quantum stochastic or quantum filtered 
dynamics analytically into the semi-norm bound considered in Boixo et al. [2007]. 
Nonetheless, the quantum Cramer-Rao bound in Eq. (5.32) is excellent fodder for 
computer simulation. By numerically integrating the stochastic Schrodinger form of 
the quantum filter in Eq. (5.28), a finite difference approximation of dpsit) / SB may 
be evaluated for different collective spin sizes F . That is, for a given choice of F, a 
finite difference approximation of the quantum Fisher information near 5 = can 
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Figure 5.2: Comparison of tlie estimation uncertainty AB as a function of 
the total atomic angular momentum (proportional to A^) for double-pass and 
single-pass atomic magnetometers determined by calculating the quantum Fisher 
Information with M = 1 (in units of 1/r) and K = Ix 10~^ chosen to be optimal 
for F = UOh. 



be constructed by co-evolving three trajectories, Po{t), psb{j)^ and p-ssij) (seeded 
by the same noise realization), and calculating 




Tr[(^ 



P-5B{r] 



2SB 



Po(r)]. 



(5.33) 



As is suggestively written, the Fisher information calculated on the particular mea- 
surement realization that generated pt and must be averaged over many realizations 
to obtain the unconditional quantum Fisher information It = E[Xt|Zt]. The lower 
bound 6Bt- can then be obtained from Eq. (5.32) with statistical errorbars given by 
a{6B^)=i;'^^a[Ir\Zt]/2. 
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Simulation Results 

We calculated 1t{B) over a range of spin quantum quantum numbers F = Nf span- 
ning more than an order of magnitude to determine a lower bound on the magnetic 
field estimation uncertainty using Eq. (5.32). The results indicate that the Fisher 
information depends heavily upon the choice of the coupling strengths M and K, 
which is not surprising since the measurement strength M determines how much 
spin-squeezing is generated and K determines the strength of the effective nonlin- 
earity. Like any measurement procedure that involves amplification, both the signal 
and noise are affected, and optimal performance requires choosing the correct gain. 

If one choses M = 1/r, to obtain an optimal spin-squeezed state at the final 
time t = T Geremia et al. [2003], then it is straightforward to optimize over the 
nonlinearity as illustrated in the inset of Fig. (5.2) for F = lOOh. We found 
that the optimal value K* depends upon the number of atoms, and that the Fisher 
information saturates and then decreases if the number of atoms exceeds the value 
of = Fsat/f used to compute K*{F). Figure 5.2) shows the behavior of 6 Br as 
a function of F up to the saturation point F < Fsat ~ 150. The largest value of F 
prior to saturation yields a 5Bj- that is slightly below the bound 1 / t'jF^^'^ that would 
be obtained for a two-body coupling Hamiltonian and an initially separable state po 
Boixo et al. [2008]. Despite this saturation of the quantum Fisher information for 
F > -Fsat at a given choice of K, one can choose the value of K* such that saturation 
occurs only for Fgat > -Flnax over any specified finite range F < F^a.^- An improvement 
beyond 1/A^ scaling can be achieved over any physically realistic number of particles. 

The saturation effect can be understood in light of the quantum stochastic model 
of the previous subsection. In considering the general stochastic propagator of Eq. 
(5.22), we identified the coupling operator L = y/MFz + iy/KFy, which if M = i^, is 
essentially the angular momentum lowering operator along x — F_^x- If M, K ^ 7!?, 
a continuous measurement of this operator very quickly moves the +a:-polarized 
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initial state onto the — a;-polarized state, which is an attractive fixed point of F^^^- 
Once this state is reached, the dynamics become relatively insensitive to the magnetic 
field value and result in a poor uncertainty lower bound. On the other hand, if M, K 
are much smaller than 7!?, the positive feedback from the i^/KFy term is washed out 
by Larmor precession. Given that we are interested in detection limits, i.e. -B ~ 0, 
we do not focus on the regime where Larmor precession dominates. 

A second approach to avoiding saturation of the Fisher information for large F 
is to scale the parameters M and as a decreasing function of F. For practical 
considerations, it is also desirable to set M = if as these parameters are determined 
by the atom-field coupling strengths on the first and second pass interactions, thus 
quantities such as the laser intensity and detuning not easily changed between the 
two passes. We have found that scaling M and K according to the functional form 

M = K = c/tF'^, (5.34) 

where c and a are constants, leads to a power-law scaling for the uncertainty bound 
6Br ~ The inset plot in Fig. (5.3) shows the slope of a linear fit of logi^dB-r 

to logi^F (i.e., a slope oik = —1 corresponds to the Heisenberg uncertainty scaling) 
as a function of a (with c chosen so as to avoid the saturation behavior described 
above). As demonstrated by the data points in Fig. (5.3), it is possible to achieve 
1/A^ scaling (to within a small prefactor offset) with a = 0.77 and c = 0.589. The 
distribution of conditional uncertainties SBT-\Zt for the statistical ensemble of mea- 
surement realizations [dots in Fig. (5.3)] is depicted for the different values of F. The 
mean and uncertainty of this distribution are denoted by the circles and errorbars, 
and a fit to this data gives 6Br ~ jp-O-^t ^ 

In short. Figures 5.3 and 5.2 suggest that there are some parameter values, ap- 
propriate for some range of F, which show an estimator uncertainty lower bound 
scaling at and or the Heisenberg limit. In practice, it seems that one would need 
to fine tune the coupling strengths M and K in order to be in a regime with such 
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Figure 5.3: Evidence tliat tlie field estimation uncertainty AB can be made to 
scale as a power law 613^- ~ 1/A^'^ by decreasing the parameters M and i^' as a 
function of the total angular momentum F according to Eq. (5.34) with a ^ 3/4. 
The power-law fit (solid line) has a slope of —0.97. 



scaling. It may be that such coupling strengths are inaccessible in an experimental 
setting. While this is an important consideration, there is a more pressing theoretical 
question — does a practical estimator exist which saturates the quantum Cramer- Rao 
bound? I summarize our search for such an estimator in the following section. 



5.4 Magnetic Field Estimators 

While studying the properties of lower bounds on estimator performance is important 
for developing an understanding of the capabilities of a given parameter coupling 
scheme, any actual procedure for implementing quantum parameter estimation must 
also develop a constructive procedure for doing the estimation. 

In this section, we consider two methods for estimating the strength of the mag- 
netic field B based on the stochastic measurement record 2'(o,t)- In both cases, we 
extend the quantum filters developed in the previous section to account for our un- 
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certainty in B, which in turn results in new filters capable of estimating B. 



5.4.1 Quantum Particle Filter 

The technique of quantum particle filtering, as developed in Chapter 4 and reviewed 
below, leverages the fact that the quantum filtering equations already provide a 
means for estimating the state of a quantum system conditioned on the measure- 
ment record. If we place the magnetic field parameter on the same footing as the 
quantum state, we can simply apply the quantum filtering results we already derived. 
Indeed, by embedding the magnetic field parameter as a diagonal operator in an aux- 
iliary Hilbert space, the quantum filter still gives the best estimate of both system 
and auxiliary space operators. We accomplish this by promoting the magnetic field 
parameter to the diagonal operator 

B^ B = J B\B){B\dB E Hb, (5.35) 

where Hb is the new auxiliary Hilbert space with basis states satisfying B\B) = B\B) 
and {B\B') = 6{B — B'). All atomic operators and states, which are associated with 
the atomic Hilbert space Ha, act as the identity on Hb, e.g. ^ I ® F^. The 
only operator which joins the two spaces is the magnetic dipole Hamiltonian, which 
is now given by 

H ^ -h-fB O Fy (5.36) 

The derivation of the quantum filtering equation is essentially unchanged, provided 
one replaces atomic operators with the appropriate forms for the joint space Hb®Ha- 

For parameter estimation, the adjoint form is the more convenient version of the 
quantum filter. Since B corresponds to a classical parameter, we require the marginal 
density matrix {pB)t = Tr7^^[pt] be diagonal in the basis of B, so that it corresponds 
to a classical probability distribution. This suggests we write the total conditional 
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density matrix in the ensemble form 



pf= / dBMB)\B){B\®p^, 



(5.37) 



where pt{B) = P(S|Z(o,t)) is precisely the conditional probability density for B. 

While one could attempt to update this state via the quantum filter, doing so 
is entirely impractical, as one can not represent an arbitrary distribution for pt{B) 
with finite resources. Instead, one approximates the distribution with a weighted set 
of point masses or particles: 



The approximation can be made arbitrarily accurate in the limit of cxd. Plug- 
ging this distribution into the ensemble density matrix form of Eq. 5.37 gives 

N 

1=1 

Each of the N triples {pf\ Bi, p[^'''*} is called a quantum particle. Intuitively, the par- 
ticle filter works by discretizing the parameter space and then evolving an ensemble 
of quantum systems according to the exact dynamics for each parameter value. The 
filtering equations below perform Bayesian inference on this ensemble, updating the 
relative probabilities of particular parameter values given the measurement record. 

The quantum particle filter for the double-pass system with unknown B is found 
by plugging the discretized ensemble pf into the extended double-pass filter. After 
a little manipulation, one finds 



N 




(5.38) 



i=l 



dpi 



N 




(5.40a) 



dpt 



t'yBi[F„ pi'''^]dt + VKM[Fy, {F„ p\'''^}]dt 
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(5.40b) 



N 



dWt = dZt-2^/M^pf'^Ti[F,pi^''^]dt 



(5.40c) 



where the prior distribution po{B) is used to determine the initial parameter weights, 
Pq \ and values, Sj. All initial quantum states, Po^'\ are taken to be the spin coherent 
state pointing along +x. 

An estimate of the magnetic field strength is then constructed from the approxi- 
mate density in Eq. (5.38), either taking the most probable B value, corresponding 
to the largest pf^ or calculating the expected value of B 



5.4.2 Quantum Kalman Filter 

Rather than constructing a magnetic field estimator from the exact quantum dy- 
namics, one could instead first focus on deriving an approximate filter for the atomic 
state, which is then a starting point for the magnetic field estimator. Indeed, previous 
work in precision magnetometry via continuous measurement Geremia et al. [2003] 
has taken this route by constructing a quantum Kalman filter to describe the atomic 
dynamics. Such a filter leverages the fact that for an initially spin polarized state 
of many atoms (say along +x), a first order Holstein-Primakoff expansion Holstein 
and Primakoff [1940] linearizes the small-angle dynamics in terms of a Gaussian state 



N 




(5.41) 



i=l 



For the latter estimate, the uncertainty is given by 




(5.42) 
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characterized by the means nt[Fz], vr^-f^] and the covariances AF^, AF^ , AF^Fy. Just 
as we saw in developing the Kalman-Bucy filter of Theorem 2.8, the conditional state 
for a linear system with Gaussian noise is itself described by a Gaussian distribu- 
tion and therefore only requires filtering equations for the means and a deterministic 
equation for the variances Kalman [I960]; Kalman and Bucy [1961]. For the case 
of magnetometry, the number of these parameters is independent of the number of 
atoms in the atomic ensemble. We will also find that within this approximation, we 
can again embed B as an unknown state parameter and find a corresponding Kalman 
filter appropriate for estimating its value. 

However, applying the small-angle and Gaussian approximations in the quantum 
case is usually done in an ad-hoc fashion, especially in light of the recent introduction 
of projection filtering into the quantum filtering setting Mabuchi [2008] ; van Handel 
and Mabuchi [2005b] . In this framework, one selects a convenient manifold of states 
whose parameterization reflects the approximations to enforce. At each point in 
this manifold, the exact differential dynamics induced on these states is orthogonally 
projected back into the chosen family. For our purposes, this means projecting the 
filter in Eq. (5.28) onto a manifold of Gaussian spin states. Although the resulting 
equations are not substantively different than those derived less carefully, we believe 
the potential application of projection filtering in deriving other approximate filters 
and master equations warrants the following exposition. 

Projection Filter Overview 

Abstractly, projection filtering proceeds as follows. We assume we already have a 
dynamical equation, such as Eq. (5.28), for a given manifold of states, such as pure 
states. For convenience, let these dynamics be represented as 



d\^p)t=Afm], 



(5.43) 
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where A/" is the generator of dynamics. Now select the desired family of "approxi- 
mating" states which are a submanifold of the exact states. We assume this family is 
parameterized by a finite number of parameters Xi,X2, ■ ■ ■ ,Xn and we denote states 
in this family as \xi,X2, ■ ■ ■ ,Xn)- At every point in this manifold, the tangent space 
is spanned by the tangent vectors 



Loosely speaking, these tangent vectors tell us how differential changes in the pa- 
rameters move us through the corresponding submanifold of \xi, X2, ■ ■ ■ , Xn) states in 
the space of pure states. This is particularly useful, as the action of the generator 
Af[\xi,X2, . ■ ■ ,Xn)] does not necessarily result in a state within the family. But by 
projecting the dynamics onto the tangent space, we can find a filter, called the pro- 
jection filter, which constrains evolution within the chosen submanifold. Explicitly, 
this projection is written as 

T = Ilspan{v^}[d\Xi, X2, ■ ■ ■ , Xn)] 



where in this pure state formulation, the inner product is the standard Hilbert space 
inner product. 

Gaussian State Family and Tangent Vectors 

For our double-pass magnetometer, we begin by introducing the two-parameter fam- 
ily of Gaussian states 



Vi = 



d\Xi,X2,.. .,Xn) 

dxi 



(5.44) 




(5.45) 



Quit) 



= e 




(5.46) 



where \F, +Fx) is the spin coherent state pointing along +x, S^^ is a spin squeezing 
operator Kitagawa and Ueda [1993] with squeezing parameter and Yg^ is a rotation 
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about the y-axis by angle 9t. Intuitively, the squeezing along z generated by S^^ 
corresponds to the squeezing induced by measuring F^. The rotation via Yg^ then 
accounts for both the random evolution due to the measurement as well as any 
rotation induced by the magnetic field. The tangent vectors for these states are 

Vet = 



d9t 

= (5.47) 

= y,,4(-2i(F,Fy + FyF,))|F,+F,). (5.48) 

In calculating the normalization of these tangent vectors, we encounter terms 
such as 

{vet.vet) = {F,+F,\sIf;S^^\F,+F,). (5.49) 

More generally, almost all inner-products needed for the projection filter will be of 
the form 

(F, +F,\Slg{F,, Fy, F,)Ylf{F^, F^, F.)n.4l^> +^.)- 

Here, g and / are polynomial functions of their arguments. Since Yq^ is a rotation, 

we can exactly evaluate 

Ylf{F^, F„ F^)Ye, = /(y/F^f,,, y/F^y,,, y/F.y.J, (5.50) 
where 

YlF^Ye, = F^iOt) = F, cos 9t + F, sin Ot (5.51) 

YlF.Yot = (5.52) 

YlF.Ye, = F,{9t) = F, cos Ot - F, sin Of (5.53) 

This leaves us with expectations of the form 

(F, +F,\Slg{F^, F„ F,)f{F^{e,),F^, FM))S^^\F, +F.) (5.54) 
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where g x f will just be linear combinations of powers and products of F^^Fy^F.^. 
Unfortunately, we cannot evaluate this expectation for arbitrary S,t- However, for 
small the state which we are taking expectations with respect to is the "squeezed 
vacuum" in our preferred basis, e.g. it is the state |F, pointing in the same 
direction, but with squeezed uncertainty in F^ and increased uncertainty in Fy. 

For large F, angular momentum expectations of such a state are extremely well 
described by the Holstein-Primakoff approximation to lowest order Holstein and Pri- 
makoff [1940] 



^ V2Fa 

^ V2Fa) (5.55) 

where F±^x = Fy±iFz, and a, are bosonic creation and annihilation operators. We 
then write our state as \F, +Fx) = |0), which is the vacuum in the Holstein-Primakoff 
representation. Under this approximation, we can use the relations 

= F (5.56) 



4F,% = ^e^^^*(a + at) (5.57) 

= -z^e-^^«'(a - at) (5.58) 

to evaluate the expectation in Eq. (5.54). In light of this approximation, the tangent 
vector overlaps are readily shown to be 

{vet,ve,) = — — (5.59) 

{v^„v^,) = 8F' (5.60) 

(%,^e,) = 0, (5.61) 



where the last result indicates the tangent vectors are orthogonal as desired. 
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Orthogonal Projection of Double-pass Filter 



Before performing orthogonal projection of the dynamics onto the tangent space, we 
must first convert the filtering equation from Ito to Stratonovich form. As is discussed 
in Ref. van Handel et al. [2005], the Ito chain rule is incompatible with the differen- 
tial geometry picture of projecting onto the tangent space. Fortunately, Stratonovich 
stochastic integrals follow the standard chain rule and are thus amenable to projec- 
tion filtering methods. Following the derivation in Appendix 5. A, we find that the 
Stratonovich SSE is given by 



d\ij)t 



+2iVkm(f}) F^ + ty/KM ( F^Fy 



Fx 



+ 



)tdt 



(5.62) 



+ iVKF, 



dWt, 



F^)- F 



where (^AF^ 

In order to find the projection filter, we compare the general projection formula 
in Eq. (5.45) to the general dynamical equation for states in our chosen family, given 
by 



(5.63) 



Using the orthogonality of the tangent vectors, the general forms for dE,t and dOt are 



dOt 



8F2 



{ve,,d\i;t)[^t,et]) 

{v^^,d\i^,)[i,,e,]), 



(5.64) 

(5.65) 



where d\ipt)[iti Gt\ is the evolution of \^t^ ^t) under the Stratonovich filter of Eq. (5.62). 

As an example calculation using these methods, consider projecting the dynamics 
generated by the magnetic field term. Its contribution cq to the 6t dynamics is given 
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by 



ce 



F 



{ve„-tiBFy\et,^t)dt) 



= -fB{0\{a + a^f\0)dt 
= jBdt. 

Similarly, the contribution to is 



(5.66) 



^€ = ^i'"^t,-hBFy\et,^t)dt) 



-fB 
4F2 



(0|%t(F,F, + FyF^)YlFyYe,S^Mdt 



oc (0|a^ + a^a^ - a - \0)dt 
= 0. 



(5.67) 



Chugging through the remaining terms in a similar fashion, we arrive at the full 
projection filter equations 



d9t = -fBdt + ^^^ e-8J"6 ^in Otdt + 2FVKM sin 9tdt 
/Me-®^«* cos Ot + Vk] o dWt 



and 



d^t 



M 



e-«^«' cos^^.rft. 



Converting back to Ito form using Eq. (2.74), we have 



de, 



dit 



M 



Bj - —e-^^^^' sm{29t) + 2FVKM sin Ot 



dt 



M 



Me"®^^* cos Ot + y/K 
e-«^«* cos^ Otdt, 



dWt 



(5.68) 



(5.69) 



(5.70a) 
(5.70b) 
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where the innovations are now in terms of the approximation of (Fz) within the 



t 



Gaussian family: 



dWt 



dZt-2\fM If\ dt 



1 1 



dZt + 2Fv^ sin Otdt. 



(5.71) 



Small-angle Kalman Filter 

We see that the projected filter in Eq. (5.70) is actually more general than the filters 
usually derived for the magnetometry problem, which do not distinguish the Gaussian 
and small-angle approximations. That is, the family of states in Eq. (5.46) and the 
approximations considered in the above derivation only enforce the Gaussian state 
assumption through the Holstein-Primakoff approximation. We can separately apply 
the small-angle approximation to recover an equation appropriate for the Kalman 
filter. In this limit, the equation for S,t completely decouples and has a closed form 
solution 



Taking the small-angle approximation for 6t and plugging in the explicit form of 
gives 



which is linear in the remaining state parameter 6^. 

While we could consider the Kalman filter for the quantum state alone, we can 
just as easily account for our uncertainty in B at the same time. That is, if we now 
embed i? as a state variable, setting = [9t B]^, the dynamics can be written in a 



6 = ^ln[l + 2FMt]. 



(5.72) 





+ Vk dWt, (5.73) 



1 + 2FMt 
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linear form as 



dXt 
dZt 

A 

B 

C 
D 



AXtdt + BdWt 
CXtdt + DdWt 



( 



2F\/KM 2(l+2FMt)2 



M 



l+2FMt 









2VMF 



(5.74) 
(5.75) 

(5.76) 

(5.77) 

(5.78) 
(5.79) 



Equations (5.74) and (5.75) are precisely a classical linear system/observation pair, 
in which the same white noise process (the innovations) drives both the system and 
observation processes. The estimate Xf = E[Xt|Z(o,j)] admits a Kalman filter solution 
Lipster and Shiryayev [1977], given by 



dXt = AXtdt + {B + VC^)dWt 
V = AV + VA^ + BB^ - {B + VC^){B + VC^y 



(5.80) 



where V is the covariance matrix 

V = E[(X - E[X]){X - E[X]y 
A^l AOtBkf 



A9tB,f ABi^ 



and 



dWt = dZt + 2FVM9tdt 



(5.81) 
(5.82) 



(5.83) 



is the innovations constructed from the current 6t estimate in the small-angle ap- 
proximation. 
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Looking at the explicit system of equations for the variances, which unfortunately 
do not admit a straightforward analytic Riccati solution as discussed in Appendix 
A, we have 

dt V (1 + 2FMt)2 * 

+2^MtBuf (5.84) 

-AF^M{M,Bkff (5.85) 



dt 



dt ^ 2{l + 2FMty 

{l + AF + 8F^Mt+ (5.86) 

8F^{1 + 2FMtfMf) MtBkf 



which are completely independent of the second-pass coupling strength K. That is, 
within the small-angle and Gaussian approximations, the double-pass system has no 
improvement in sensitivity and gives rise to the same F~^ uncertainty scaling found 
previously for single-pass systems Geremia et al. [2003]. Perhaps this is unsurprising, 
as we attempted to find a linear description of an essentially non-linear affect. In- 
deed, the numeric simulations in the next section suggest the single-mode Gaussian 
approximation breaks down just as the double-pass filter begins to show improved 
sensitivity to the magnetic field parameter. Finally, not that I have also derived a 
filtering equation which retains the next term in the Holstein-Primakoff expansion, 
but whose K dependence nonetheless shows a negligible change relative to the lowest 
order expansion. 



5.5 Simulations 



Given the absence of an analytic improvement in the sensitivity of the quantum 
Kalman filter, we turn to numerical simulations of the quantum particle filter in 
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order to gauge the potential of the double-pass system for magnetometry. First recall 
how the filter would be used in an actual experiment. Continuous measurements of 
the atomic cloud Larmor precessing under a particular, albeit unknown, magnetic 
field B would give rise to the observations process 2'(o,t). This would then be fed 
into a classical computer to propagate the quantum particle filtering equations given 
in (5.40). The computer would then use the quantum particle set to provide the 
estimate Bpf and uncertainty ^B^^. 

In order to simulate such an experiment, we can generate the stochastic mea- 
surement record Z{Q^t) using the quantum filter for the double-pass system given in 
Eq. (5.28), evolved with a known magnetic field B. Since the system is driven by 
the white noise process dWt-, the filtering equations may be integrated by the same 
integrator previously used to approximate the quantum Cramer-Rao bound. The 
measurements generated by these trajectories are equivalent to what the quantum 
particle filter would receive in an experiment, which means they can then be fed 
into the same particle filtering code to simulate an estimate of B. In order to com- 
pare performance, we actually simulate two systems in parallel, one representing the 
double-pass system and the other, with if = 0, representing a single-pass system. 
Both utilize the same noise realizations on an individual trajectory. 

As is common when considering detection limits, we focus on the case of -B = 0. 
Although an unbiased estimator would assume no prior knowledge of the magnetic 
field value, such an approach is impractical for the particle filter, which would fail in 
approximating such large uncertainty with a finite number of particles. As such, we 
take the initial distribution of B values for the quantum particle set to be Gaussian 



with mean /i^ = and variance a\ = 10r~^ , where we again set 7 = 1 and again 
define all parameters in units of r^^. For a set of particles, the particle magnetic 
field values {Bi} are drawn from the initial distribution, with weig hts = 1/A^. 




1 



{B - ^iB? 



(5.87) 
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Figure 5.4: Estimator uncertainties as a function of F averaged over 100 
trajectories with M = 10r-\ K = 0.0006r-\ 5 = and r = 0.1. The initial 
= 1000 particle set was drawn from a Gaussian distribution with mean zero 
and variance 10r~^^, which was also the same initial uncertainty in the Kalman 
filter ABkf. A power-law fit to the particle filter (PF) scalings shows a single- 
pass scaling of F~^'^^ and a double-pass scaling of F~^'^^. Also shown are the 
quantum Cramer-Rao (QCR) bounds previously simulated for Figure 5.2. The 
inset shows the sample estimator deviation Spj for the same simulations. 



The initial quantum state for all particles is set to the spin-coherent state along 
+x,i.e. \F,+F^). 

Figure 5.4(a) shows, with solid lines, the average particle filter uncertainty ABpf 
as a function of F, averaged over 100 measurement realizations using N = 1000 
particles in each run of the filter. The error bars represent the the deviation in the 
simulated uncertainties over the 100 runs. As was the case for the Fisher information 
calculations, we observe an improved sensitivity scaling for the double-pass system, 
albeit with increased fluctuations in the individual run uncertainty ABpj. Power- 
law least-squares fits of the average give a single-pass uncertainty scaling _p~o.93 ^-^^ 
a double-pass scaling of F~^'^^ which are consistent with the quantum Cramer- Rao 
scalings in figure 5.2. Also shown is the analytic single-pass uncertainty scaling given 
by numerical integration of the Kalman covariance matrix via Eq. (5.84). We see 
that this agrees very well with the single-pass particle filter scaling and since it is 
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consistent with previous Kalman filters used for magnetometry Geremia et al. [2003] , 
suggests tlie double-pass scaling does indicate improved sensitivity. 

Of course, these statements are not without caveats. The dashed lines in the 
plot correspond to the numerically computed quantum Cramer-Rao bound, which is 
clearly below the estimates of all the filters. This might mean that the continuous- 
measurement which gives rise to the numerical bound is simply not saturated by the 
corresponding estimator for that continuous-measurement. Unfortunately, the above 
data took a week to generate on a quad-core workstation, indicating the technical 
challenges already present in simulating an = 1000 quantum particle set for the 
depicted range of F limits the quality of the statistics. As previously mentioned, 
the particle filter approximation is inherently biased, with the variance of estimates 
converging as A^~^. The inset in figure 5.4 shows the sample estimator deviation 
Spf, which is the deviation in the actual performance error of the particle filter on 
each individual run, i.e. Bpf — B where the true 5 = 0. In other words, ^Bpf is 
the uncertainty calculated for an individual trajectory from the particle distribution 
{pf^}, which is averaged over many trajectories to get ABpf. However, an individual 
run of the particle filter also gives an estimate Bpf of the true magnetic field B. Since 
we know that the measurements were generated from a system evolved with B = 0, 
we can calculate the deviation in the actual estimates Bpf. If the particle filter 
were unbiased, we would expect this sample deviation to equal the average particle 
filter deviation, i.e. Spf = ^Bpf. Instead, the sample deviation dwarfs the average 
estimator uncertainty, indicating that the particle filter bias dominates. As discussed 
in [Chase and Geremia 2009a], this bias seems to be due to the prior distribution 
considered for B. Ideally, we would want this distribution to have infinite variance in 
order to be truly unbiased, but that is not practical for the particle filter simulations. 
Instead, future work will need to consider alternate strategies for eliminating this bias 
in practice. 

Numerical simulation also provides insight into how the Gaussian state assump- 
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tion of the Kalman filter applies in the double-pass case. Figure 5.5 shows quantum 
states evolved under two different noise realizations with B = 0, M = 10t^^,K = 
0.0006r~^. Both states were initially spin-polarized along +x and evolved under the 
full double-pass SSE in Eq. (5.28). The Q-function shown is defined as 

Q{e,^,t) = \{e,^\iJt)\' (5.88) 

where the spin-coherent state \6,(j)) is the +F eigenstate of the spin-operator 

Fx sin 6 cos (p + Fy sin 6 sin (p + cos 6. (5.89) 

Although one example shows a Gaussian squeezed spin state, the other shows a 
state with a bimodal Gaussian distribution. Such a state is poorly described by the 
Gaussian family in Eq. (5.46) and helps explain why the Kalman filter fails to find a 
difference between the single and double-pass setup. These plots suggests a family of 
bimodal Gaussian states might result in a useful projection filter. I have been unable 
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to find a parameterization of sucfi a family which admits an analytic derivation of a 
projection filter. 



5.6 Summary 



In this chapter, we have explored the use of double-pass continuous measurement 
for precision magnetometry. The primary result involves numerical simulations of 
the quantum Cramer-Rao bound which indicate that a double-pass system shows an 
improved magnetic field uncertainty scaling with atom number over a comparable 
single-pass system, albeit only for particular choices of coupling strengths relative to 
the collective spin size. This is in contrast to quantum information theoretic bounds 
which suggest that the Heisenberg limit bounds the uncertainty scaling for both 
a single and double-pass system. Clearly, future work aimed at reconciling these 
results is necessary, particularly deriving analytic quantum Cramer-Rao bounds for 
unbounded ancilla systems. We have also explored estimators intended to achieve 
the uncertainty scaling seen in numerical simulations. Taking a brute force approach, 
quantum particle filters show evidence of the improved double-pass scaling, although 
the results suffer from limited statistics which can not be significantly improved 
with current computational power. More practical quantum Kalman filters show 
no improved sensitivity, which are consistent with an observed breakdown in the 
Gaussian state assumption used to derive them. However, the general projecting 
filtering technique used in the Kalman filter derivation provides an avenue for deriving 
more appropriate filters which might prove more tractable for practical magnetic 
field estimation. More generally, similar effective nonlinear interactions may prove 
an important tool in precision measurement. 
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5. A Converting between Ito and Stratonovich SDE 



For the double-pass Ito SSE in Eq. (5.28), we begin the conversion by noting that 
states with entirely real amplitudes form an invariant set and therefore write \ip)t = 
J2m=-F ^Tl''^) ■ The stochastic coefficient is then 



m=—F m,n=—F 
. F 

+ -v^5Z - m)(F + m + l)xr\m + 1) 



m,=—F 



-\/{F + m){F -m + l)xr\m - 1) 
which has as its j-th entry 

F 



M{j - J2 n{xn'')xt' 

n=-F 

^ ^^/iF-j + l){F + j)xt 



-v/(F + j + l)(F 



3)xt 



The derivative with respect to Xt'' is then 

F 

nixt^y.k 



db^{t,xt) 



dxk 



M2kxt^xt^ 



n=-F 



+ 



v/(F-j + l)(F + j)% 



i),fe 



-v/(F + j + 1)(F-j)Wa 
so that the sum in Eq. (2.74) is 



(5.90) 



(5.91) 



Xt 



k=-F 



dV{t,Xt) 
dxf^ 



nixt^'Wit, Xi)-2v^^ A;x/6^(t, Xt)xt^ 



n=-F 



+ 



^(F-j + l)iF + j)I^-\t,Xt) - y^iF + j + l)iF-j)V+\t,Xt) 

(5.92) 
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This suggests an equivalent operator form 



M{F, 



2VM(F,{VM{F,-(FA) + iVKF^)) = 



M{F, -{F,))+ iVKFy - 2M ( AF^ ) - 2iVKM {F,Fy) , (5.93) 



\ / 1 



where {AF^j = (^F^) - (P^) , so the Stratonovich SSE is 



d\iP)t 



-i-fBFy-M {F,-{F,)f-(AF, 



+2i\fKM {f,J^ Fy + i^/KM (P^Fy 



+ 



tdt 



(yM{F, - + iy/KFy^ \^^J)t o dWf (5.94) 
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Chapter 6 

Feedback controllers for quantum 

error correction 



In this chapter, I review quantum feedback protocols for performing continuous- 
time quantum error correction. After studying the structure of the quantum filter, I 
describe a low-dimensional representation which although inexact, gives rise to the 
same feedback performance of the exact quantum filter. The work presented here is 
published in [Chase et al. 2008] and I refer the reader to [Gottesman 1997; Nielsen 
and Chuang 2000] for a thorough introduction to quantum error correction. 



6.1 Introduction 



Quantum error correction is inherently a feedback process where the error syndrome 
of encoded qubits is measured and used to apply conditional recovery operations 
[Gottesman 1997]. Most formulations of quantum error correction treat this feedback 
process as a sequence of discrete steps. Syndrome measurements and recovery oper- 
ations are performed periodically, separated by a time-interval chosen small enough 
to avoid excessive accumulation of errors but still comparable to the time required to 
implement quantum logic gates [Gottesman 1997; Nielsen and Chuang 2000]. There 
is, however, mounting evidence from the field of real-time quantum feedback control 
[Armen et al. 2002; Bouten et al. 2009; Wiseman 1994] that continuous observa- 
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tion processes offer new, sometimes technologically advantageous, opportunities for 
quantum information processing. 

Toward this end, Ahn, Doherty and Landahl (ADL) [Ahn et al. 2002] devised a 
scheme to implement general stabilizer quantum error correction [Gottesman 1997] 
using continuous measurement and feedback. Unfortunately an exact implemen- 
tation of the ADL scheme is computationally demanding. For an n-qubit code, 
the procedure requires one to time-evolve a 2"-dimensional density matrix for the 
logical qubit alongside the quantum computation [Ahn et al. 2002]. This classical 
information-processing overhead must be performed to interpret the continuous-time 
error syndrome measurement data and determine how recovery operations, in the 
form of a time-dependent feedback Hamiltonian, should be applied. While n is a 
constant for any particular choice of code, even modest codes such as the five-qubit 
code [Bennett et al. 1996; Lafiamme et al. 1996] and the seven-qubit Steane code 
[Steane 1996] push classical computers to their limits. Despite state-of-the art exper- 
imental capabilities, it would be extremely difficult to implement the ADL bit-flip 
code in practice. Consequently, Ahn and others have devised alternate feedback pro- 
tocols which are less demanding [Ahn et al. 2004; Sarovar et al. 2004], but perform 
worse than the the original ADL scheme. 

Recently, van Handel and Mabuchi addressed the computational overhead of 
continuous-time error syndrome detection [van Handel and Mabuchi 2005a] using 
techniques from quantum filtering theory presented in Chapter 3. They developed an 
exact, low-dimensional model for continuous-time error syndrome measurements, but 
did not go on to treat continuous-time recovery. The complication is that any feed- 
back Hamiltonian suitable for correcting errors during the syndrome measurements 
violates the dynamical symmetries that were exploited to obtain the low-dimensional 
filter in Ref. [van Handel and Mabuchi 2005a]. While one might address this com- 
plication by simply postponing error recovery operations until a point where the 
measurements can be stopped, there may be scenarios where it would be preferable 
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to perform error recovery in real-time. For example, if the recovery operation is not 
instantaneous, responding to errors as they occur might outperform protocols where 
there are periods without any error correction. 

In this chapter, I extend the quantum filtering approach developed by van Handel 
and Mabuchi to include recovery operations. I further consider an error-correcting 
feedback Hamiltonian of the form devised by Ahn, Doherty and Landahl, but the 
approach readily extends to other forms for the feedback. While an exact low- 
dimensional model for continuous-time stabilizer generator measurements in the pres- 
ence of feedback does not appear to exist, I present an approximate filter that is still 
low-dimensional, yet sufficiently accurate such that high-quality error correction is 
possible. 



6.2 Continuous-Time Quantum Error Correction 

For our purposes, a quantum error correcting code is a triple {E, Q, R). The quantum 
operation E : C^'^ i— > C^"" encodes k logical qubits in n physical qubits. ^ is a set 
oil = n — k stabilizer generator observables with outcomes ±1 that define the error 
syndrome. R : {±1}®' i-^ £2nx2n jg ^j^g recovery operation., which specifies what 
correction should be applied to the physical qubits in response to the syndrome 
measurement outcomes. 

The particular choice of code (i?, R) is usually made with consideration for 
the nature of the decoherence affecting the physical qubits [Knill et al. 2000]. For 
example, the bit-fiip code (considered by both ADL and van Handel and Mabuchi) 
improves protection against an error channel that applies the Pauli ax operator to 
single qubits at a rate 7. Here, we adopt the notation that X„ represents the Pauli 
(J J. operator on qubit ra, and similarly for Yn and Z„. In the bit-fiip code, E encodes 
k = l qubits in n = 3 qubits by the map a|0) + /?|1) ^ a|000) + /3|111). The / = 2 
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stabilizer generators are gi = ZZI := ® ctz ® I and g2 = IZZ := I ®az®az', each 
extracts the parity of different qubit pairs. The recovery R, given the outcomes of 
measuring {gi,g2), is defined by (+1, +1) t-^ /, (+1, —1) i-^ X3, (—1, +1) Xi and 

(-1,-1)^^2. 

In this chapter, we focus primarily on the five-qubit-code {n = 5, k = 1) that in- 
creases protection against general separable channels, and in particular the continuous- 
time symmetric depolarizing channel that applies all three Pauli operators to each 
of the physical qubits at the same rate 7. The five-qubit code has / = 4 stabilizer 
generators {XZZXI,IXZZX,XIXZZ, ZXIXZ}. It is also a perfect code in that 
all 16 distinct syndrome outcomes indicate distinct errors: one corresponding to the 
no-error condition, and one syndrome for each of the three Pauli errors on each of the 
five qubits. I defer to [Gottesman 1997; Nielsen and Chuang 2000] for the encoding 
and recovery procedures for this code. 



6.2.1 Stabilizer Generator Measurements 

Quantum error correction can be extended to continuous time by replacing discrete 
measurements of the stabilizer generators gi, . . . ,gi with a set of I continuous obser- 
vation processes dQf^ [Ahn et al. 2002]. We do not consider here how one might 
implement the set of / simultaneous stabilizer generator observations other than to 
comment that doing so in an AMO technology would likely involve coupling the n 
physical qubits to a set of electromagnetic field modes and then performing contin- 
uous photodetection on the scattered fields. While this model is rather general, we 
take the same measurement strength k for each qubit, implying symmetric coupling 
of the qubits. 

Following the techniques in Chapter 3, one arrives at the following form of the 
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quantum filter for tlie conditional density matrix pt. 



n 



m=l j i=l 
I 

+v^Y.'^^3^]pt {dQf^ - 2y/^TT[giPt]dt] 
1=1 

-t[Ht,pt]dt, (6.1) 

where j G {x,y,z} and the superoperators are defined as: T>[a]p = a pa — p and 
'H[gi\p = gip + pgi — 2 Tr [gip] p- The innovations 

dw}'^ = dQf - 2v^Tr [gipt] dt (6.2) 

obtained from the measurements dQ^^"' are independent Wiener processes, each with 
E[(iVFt] = and dW^ = dt. The first term in the filtering equation accounts for the 
action of the continuous-time symmetric depolarizing channel. The time evolution 
Pt generated by a particular noise realization is generally called a trajectory. 

The final term in Eq. (6.1) describes the action of the time-dependent feedback 
Hamiltonian used to implement error recovery. Following Ahn, Doherty and Landahl, 
we choose the feedback Hamiltonian to be of the form 

n 

^* = EE^i?-r> (6-3) 



m=l 



3 



which corresponds to applying Pauli operators crj™'' to each qubit with a controllable 
strength A^™\ The policy for determining the feedback strengths A^™^ at each point in 
time should be chosen optimally. Ahn, Doherty, and Landahl obtained their feedback 
policy by defining the codespace projector Ho onto the no error states (states which 
are +1 eigenvectors of all stabilizers) and then maximizing the codespace fidelity 
Tr[noPt]. Assuming a maximum feedback strength Amax, the resulting feedback 
policy is given by setting 

aJ? = Xu... sgn(Tr [-z^o, V*] ) • (6-4) 
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Computational Expense 

Because this is a closed-loop strategy, the feedback controller must determine each 
A^™^ from the evolving measurement in real time. The utility of feedback in any 
real setting then relies greatly upon the controller's ability to integrate the filtering 
equation rapidly enough to maintain pace with the quantum dynamics of the qubits. 
For the five-qubit code, 1024 — 1 real parameters are needed to represent the density 
matrix. We found that stable numerical integration via the techniques in Appendix B 
for even a single trajectory required approximately 36 seconds on a 2.1 GHz desktop 
computer {'ydt ^ 10~^ over a timespan [0,0.257]). This is far from adequate for use 
in an actual feedback controller even in state-of-the-art experiments. 

Moreover, Eq. (6.1) is a nonlinear filter, and for such filters it is rarely possible 
to evaluate even qualitative properties analytically. One must then average over an 
appreciable number of trajectories to find the expected behavior of quantities such 
as the codespace fidelity as a function of time. For the five-qubit code, our integrator 
requires approximately 10 hours to simulate 1000 trajectories. 



6.2.2 Reduced-Dimensional Filters 



Considering that the syndrome measurements yield information about correlations 
between qubits and not information about the individual states of the qubits, one 
can imagine that propagating the full density matrix is excessive. Indeed, the ADL 
scheme only makes use of the projection of pt onto the codespace, generating the 
same feedback policy regardless of which state po iii the codespace is initially chosen. 
It is reasonable to expect that a lower dimensional model could track solely the 
information extracted from the syndrome measurements. This is exactly the premise 
used by van Handel and Mabuchi to obtain a low-dimensional model of continuous- 
time stabilizer generator measurements (in the absence of feedback) [van Handel and 
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Mabuchi 2005a]. They formulate the problem as a graph whose vertices correspond 
to syndromes and whose edges reflect the action of the error model. The filtering 
problem is then reduced to tracking the node probabilities, i.e., the likelihoods for 
the qubit to be described by each of the various syndrome conditions. Dynamical 
transitions occur between the syndromes due to the error channel, and the filter 
works to discern these transitions from the stabilizer measurement data. 

For an {E, Q, R) code, van Handel and Mabuchi define a set of projectors onto the 
distinct syndrome spaces. For the five-qubit code, there are 16 such projectors; IIo is 
the codespace projector as before and 11^™'' = a^^^^Uoa^^^ are projectors onto states 
with a syndrome consistent with a aj error on qubit m. Forming the probabilities 

pJ.? = Tr[n5™V.] (6.5) 

into a vector and computing dp^^^ from the full dynamics leads to the reduced 
filter 

I 

dpt = Aptdt + 2^Y,^Hi - hi^ptI)ptdWt (6.6) 

fc=i 

with Ars = 7(1 — lQ6rs), hf"^ the outcome of measuring gi on U^J^^ and Hi = diagh/ 
(Eq. (4) in Ref. [van Handel and Mabuchi 2005a]). The equations for p^J^^ are closed 
and encapsulate all the information that is gathered from measuring the stabilizer 
generators. Equation (6.6) is an example of a Wonham filter, which is the classical 
optimal filter for a continuous-time finite-state Markov chain with an observation 
process driven by white noise [Wonham 1965]. Further discussion of the Wonham 
filter and its use in conjunction with discrete-time error correction can be found e.g., 
in Ref. [van Handel and Mabuchi 2005a]. 
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6.3 Error Correction with Feedback 

We now extend Eq. (6.6) to include a feedback Hamiltonian suitable for error recov- 
ery. Following van Handel and Mabuchi's lead, we see that Eq. (6.6) was derived 
by taking dp^JJ^ = Tr^U^j^^dpt] for a basis which closed under the dynamics of the 
continuous syndrome measurement. One hope is that simply adding the feedback 
term in by calculating Tr [—iX^^l'Il^™'^ l^k^ i Pt]] also results in a set of closed equations. 
However, that is not the case when using the basis of the sixteen syndrome space 
projectors Specifically, [H^™'', a^'^''] cannot be written as a linear combination 

of syndrome space projectors. This is not surprising as the feedback Hamiltonian 
term under consideration is the only term which generates unitary dynamics. 

Inspired by the form of the commutator between the feedback and the syndrome 
space projectors, we define feedback coefficient operators 

hJ.™) = i+i or + l)a^5Hf , (6.7) 

where c is an arbitrarily chosen index used to distinguish the i or 1 prefactor and 
combination of Pauli matrices which sandwich the syndrome space projector 
For the five-qubit code, the syndrome projectors are simply those operators which 
have the 1 prefactor and 10 identity matrices. The corresponding feedback coefficient 
is p^J^^ = Tr [nj'^^pi] . If we then iterate the dynamics of the filter (6.1) by calculating 
p^J^^ starting from the syndrome space projectors, we find that each feedback Hamil- 
tonian term generates pairs of feedback coefficient terms. For example, calculating 
the dynamics due to feedback Xi on Hq generates two feedback coefficient operators: 
Ho,o = "^HoXi and Ho,i = iXiHo. We must then determine the dynamics for these 
first level feedback coefficients. This will include calculating the feedback on Ho,i, 
which generates second level feedback coefficients Ho,2 = XiY^Uq and Ho,3 = XiUqY^. 
Continuing to iterate feedback coefficient terms, we find that an additional 1008 dis- 
tinct pj^^ terms are needed to close the dynamics and form a complete basis. Adding 
in the initial 16 syndrome space projectors gives a 1024 dimensional basis — clearly no 
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Figure 6.1: Non-zero matrix elements of (a) untruncated and (b) truncated 
filter. Blue squares correspond to decoherence terms, red crosses correspond 
to measurement terms and green dots correspond to feedback terms. Note the 
difference in dimension of the matrices. 

better than propagating the full density matrix. However, it is now relatively easy to 
calculate the feedback strengths, which depend only on pairs of first-level feedback 
coefficients. For example, from Eq. (6.4) we find that Aq\^ = Amax sgn (— po,o +Po,i)) 
where po,o = Tr[no,oPt] and po,i = Tr[no,ipt] are first-level coefficients developed 
earlier in the paragraph. 

6.3.1 Approximate Filter for the Five-Qubit Code 

Although the dimension of the alternate basis is no smaller than the dimension 
of the full density matrix, the structure of the filter represented in the alternate 
basis provides a manner for interpreting the relative importance of the p^J^^ feedback 
coefficients. This is best seen graphically in Fig. 6.1(a), which superimposes the non- 
zero matrix elements coming from the noise, measurement and feedback terms. Both 
measurement and noise are block diagonal as expected; it is the feedback that couples 
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blocks together in a hierarchical fashion. This hierarchy can be parameterized by 
the number of "feedback transitions" which connect a given feedback coefficient to 
the syndrome space block. For example, the upper left block, which corresponds 
to the syndrome space projectors, is connected via feedback terms to the first level 
feedback block, whose feedback coefficients are each one feedback transition away 
from the syndrome space block. In turn, the first level block is then connected to a 
second level feedback block, whose feedback coefficients are two feedback transitions 
away from the syndrome block. 

Given that the initial state starts within the codespace and given that feedback 
is always on, the feedback coefficients that are more than one feedback transition 
from the syndrome space block should be vanishingly small. Limiting consideration 
to these first two blocks, we also find that pairs of feedback coefficients couple iden- 
tically to the syndrome space block. For example, we find that —iXiUo and iUoXi 
couple to syndrome space projectors identically. This is not surprising, as these 
two terms comprise the commutator that results from the Xi feedback Hamiltonian. 
However, outside the first level of feedback transitions, the matrix elements of these 
feedback coefficients differ. Additionally, feedback coefficients involving feedback 
Hamiltonians which correspond to a syndrome error on the codespace projector are 
related as 

■ ("^)tt I -TT (™) •tt('") {™) I • (™)tt(™) /r o\ 

-la] 'liQ + iIiQa^j — -^n^- 'a] + . (6.8) 

For the feedback coefficient examples just mentioned, this relation is — iXiIl^^'* + 
iU^^^Xi = — zllo^i + ^^iHo. Truncating the dynamics to include only the first level 
of feedback and combining distinct feedback coefficients which act identically within 
this block results in the matrix of Fig. 6.1(b) over only 136 basis elements. Note 
that the controller now only needs to reference a single basis element for calculating 
a given feedback strength Aj™''. 
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6.3.2 Approximate Filter for General Codes 

The truncation scheme generahzes for reducing the dimensionahty of the quantum 
filter for an arbitrary {E, Q, R) code. Such a filter for an [n, k\ quantum error- 
correcting code [Nielsen and Chuang 2000] has the same form as Eq. (6.1), but 
involves n physical qubits and I = n — k continuous-time stabilizer generator mea- 
surements. In the following, we assume the continuous-time symmetric depolarizing 
channel, though it should be straightforward to extend to other noise models. For a 
non-perfect, non-degenerate code, there are a total of 2"^' stabilizer generator mea- 
surement outcomes, but only 3n -|- 1 will be observed for the given noise channel. 
For a perfect, non-degenerate code (2"~' = 3n + 1), all possible syndrome outcomes 
are observed. In either case, given the observable syndrome outcomes, we can define 
3?T, + 1 syndrome space projectors and feedback parameters needed for recovery. 
Degenerate codes require fewer than 3?t, recovery operations, as distinct actions of 
the noise channel give rise to identical errors and recovery operations. The degener- 
acy depends greatly on the particular code, so we merely note that degenerate codes 
will require fewer syndrome space projectors and feedback parameters than their 
non-degenerate relatives. 

Once we determine the syndrome space projectors and feedback parameters for 
the code, we can introduce feedback coefficient operators of the form of (6.7) but 
over n qubits. A truncated filter is constructed as follows. 

1. Close the dynamics of the syndrome space projectors by introducing first-level 
feedback terms. 

2. Close the dynamics of the first-level feedback terms by truncating to a basis of 
syndrome space and first-level feedback terms, i.e. throw out potential second- 
level feedback terms. 

3. Each of the 3n + 1 syndrome space projectors in this truncated form have 
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Figure 6.2: On the left, a schematic diagram of truncating the filter to only 
syndrome space and first level feedback blocks. On the right, just a few of 
the 1024 feedback coefficients of the five-qubit code representing the different 
feedback block levels. 



3n feedback coefficients, with pairs of these terms comprising each feedback 
Hamiltonian commutator. Moreover, there is a factor of degeneracy between 
syndrome space projectors and feedback coefficients which involve the same 
Pauli matrix [c.f., Eq. (6.8)]. A similarity transform is used to combine these 
pairs leaving {3n + 1) + (3n + l)3n/2 = | (2 + 9n(n + 1)) basis elements in the 
fully truncated filter. 



The truncated filter requires only 0{n'^) basis elements, as compared to the 4" pa- 
rameters for the full density matrix. Additionally, the feedback strengths in Eq. 
(6.4) are readily calculated from the combined first-level feedback coefficients. The 
truncation process is depicted schematically in the left half of Fig. 6.2. The right 
half of the figure gives examples of a few of the 1024 terms involved in the truncation 
procedure for the five-qubit code. 



Chapter 6. Feedback controllers for quantum error correction 



188 




0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 

Time (units of y) Time (units of y) 



Figure 6.3: Numerical simulations of the five qubit code to assess the av- 
erage code space fidelity. Plot (a) compares the codespace fidelity (averaged 
over 10 trajectories) for filters with different levels of truncation: the full (1024- 
dimensional) and first-level truncated (136-dimensional) filters are essentially 
identical. Plot (b) shows the codespace fidehty averaged over 2,000 trajecto- 
ries using the truncated 136-dimensional filter for error correction. (Simulation 
parameters: Amax = 2OO7 and k = IOO7.) (Color online.) 

6.3.3 Numerical Simulation 

Since the truncated filter is also nonlinear, it is difficult to provide analytic bounds 
on possible degradation in performance. However, we can easily compare numerical 
simulation between feedback controllers which use the full or truncated filter. In 
fact, the dynamics should be close for the same noise realizations, indicating that 
they should be close per trajectory. 

In order to analyze the feedback controller's performance, the full filter Eq. (6.1) 
is used to represent the underlying physical system. The feedback controller was 
modeled by simultaneously integrating the truncated filter, driven by the measure- 
ment current from the full filter. The feedback controller then calculated the feedback 
strengths which were fed back into the full filter. The dynamics described by the full 
filter were then used to compute the codespace fidelity. Using a predictor-corrector 
SDE integrator discussed in Appendix B and varying k and Amax over a wide range, 
we find essentially indistinguishable performance between the full and truncated fil- 
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ters. Using k = IOO7 and Amax = 2OO7 as representative parameters, Figure 6.3(a) 
demonstrates this general behavior by comparing the average codespace fidehty of a 
handful of trajectories using the different filters. Integrating an individual trajectory 
takes approximately 39.5 seconds using a 2.1 GHz PowerPC processor. Integrating 
the full filter alone takes approximately 36 seconds, while integrating the truncated 
filter alone takes approximately 3.5 seconds. 

In addition to showing the identical performance of the full and truncated filters. 
Fig. 6.3(a) also shows the loss in performance if one were to truncate further. The 
31 dimensional filter is comprised of the 16 syndrome projectors and the 15 feedback 
coefficients which have non-zero feedback matrix elements with the codespace Hq. 
These are the only elements explicitly needed to calculate the feedback strengths 
in Eq. 6.4. This filter fails because it tacitly assumes the action of feedback on 
the codespace is more "important" than on the other 15 syndrome spaces. Since 
feedback impacts all syndrome spaces equally, we need to retain those terms in order 
to properly maintain syndrome space probabilities. Intuitively, this suggests that the 
136 dimensional filter is the best we can do using this heuristic truncation strategy. 
For reference. Fig. 6.3(b) shows the average codespace fidelity of 2000 trajectories 
when using the truncated filter. 

Comparison with Discrete Error Correction 

Given the success of the truncation scheme, we now compare the performance of 
feedback-assisted error correction to that of discrete-time error correction for the 
five-qubit code. The discrete model considers qubits exposed to the depolarizing 
channel 

n=5 

C^Pdiscrete = 7 X] ^K'^VdiscreteC^^ (6.9) 

j=x,y,z m=l 
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up to a time t, after which discrete-time error correction is performed. The solution 
of this master equation can be exphcitly calculated using the ansatz 

5 

Pdiscrete(t) = ae{t)PpoP, (6.10) 

e=0 P-pw{P)=e 

where P is a tensor product of Pauli matrices and the identity. The function 
pw{P) gives the Pauli weight of a matrix, defined as the number of (Tx,(Jy, and 
az terms in the tensor representation. Thus, ao(t) is the coefficient of po and sim- 
ilarly ai{t) is the coefficient of all single qubit errors from the initial state, e.g., 
XIIII{po)XIIII, IIZII{po)IIZIL 

The codespace fidelity considered earlier is not a useful metric for comparison, 
as discrete-time error correction is guaranteed to restore the state to the codespace. 
Following Ahn, Doherty and Landahl, we instead use the codeword fidelity Fcw{t) '■= 
Tr [poP(^)] ) which is a measure relevant for a quantum memory. Since error correction 
is independent of the encoded state, we choose the encoded |0) state as a fiducial 
initial state. Given that the five-qubit code protects against only single qubit errors, 
we find that after error correction at time t, the codeword fidelity for discrete-time 
error correction is 

= «o(t) + a,{t) = ^e-2°*^ (3 + e^'^Y (-3 + 4e^*^) , (6.11) 

which asymptotes to 1/64. This limit arises because prior to the stabilizer generator 
measurements, the noise pushes the state to the maximally mixed state, which is 
predominately composed of the 02 (t) through a^{t) terms. 

The feedback codeword fidelity i^^f^^^^back calculated by integrating both the 
full quantum filter (6.1), representing the underlying system of qubits, and the 
truncated filter, representing the feedback controller. Again, we chose k = IOO7, 
Amax = 2OO7 and dt = 10~^7 and used the same SDE integrator described above. 
Figure 6.4 shows the average of i^^f^^dback ^^^^ 2000 trajectories, demonstrating that 
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Figure 6.4: Comparison between continuous-time and discrete-time error cor- 
rection for the five-qubit code. For the continuous-time error correction simula- 
tions, the codeword fidehty was averaged over 2,000 trajectories with k, = IOO7 
and Amax = 2OO7. (Color online.) 



there are regimes where feedback-assisted error correction can significantly outper- 
form discrete-time error correction. Feedback-assisted error correction appears to 
approach an asymptotic codeword fidelity greater than what would be obtained by 
decoherence followed by discrete-time error correction. Due to the nonlinear feed- 
back, it is difficult to calculate an analytic asymptotic expression for the continuous- 
time strategy. Nonetheless, the improved performance for the timespan considered 
suggests that better quantum memory is possible using the feedback scheme. 



6.4 Summary 



Extending control theory techniques introduced by van Handel and Mabuchi [van 
Handel and Mabuchi 2005a], I have developed a computationally efficient feedback 
controller for continuous-time quantum error correction. For the truncation scheme, 
the dimension of the filtering equations grows as in the number of physical 
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qubits n, rather than (9(4") for the original Ahn, Doherty and Landahl procedure 
[Ahn et al. 2002]. By numerical simulation of the five-qubit code, we have seen the 
viability of such a filter for a quantum memory protecting against a depolarizing 
noise channel. Moreover, in all simulations, this performance is indistinguishable 
from that of the computationally more demanding filter of the ADL style. 

In systems where recovery operations are not instantaneous relative to decoher- 
ence, consideration suggests that it is desirable to perform syndrome measurement, 
recovery, and logic gates simultaneously. However, it is not immediately clear how 
gates impact the feedback controller. Indeed, if a Hamiltonian is in the code's nor- 
malizer, the continuous-time feedback protocol and its performance are unchanged. 
Though a universal set of such Hamiltonians can be found, it might be desirable to 
find universal gates which have physically simple interactions. Future work involves 
finding such gate sets and developing a framework for universal quantum comput- 
ing. Additional issues of fault-tolerance and robustness could then be explored within 
such a universal setup. Exploring feedback error correction in the context of specific 
physical models will provide opportunities to tailor feedback strategies to available 
control parameters and salient noise channels. Such systems might allow the calcu- 
lation of globally optimal feedback control strategies. 
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Chapter 7 



Model Reduction of 



Collective 



Qubit 



Dynamics 



We saw in the previous chapter that the abihty to find a low-dimensional model of a 
collective quantum systems allows one to efficiently simulate complex dynamics and 
in turn, design a practical feedback controller. In this chapter, I focus on a problem 
outside the quantum feedback and control realm and present an exact, but nonethe- 
less computationally appealing, description of arbitrary collective processes on open 
qubit systems. The work presented here was published in [Chase and Geremia 2008] . 

7.1 Introduction 

The ability to model the open system dynamics of large spin ensembles is crucial to 
experiments that make use of many atoms, as is often the case in precision metrol- 
ogy [Itano et al. 1993; Kominis et al. 2003], quantum information science [Chaudhury 
et al. 2007; Julsgaard et al. 2001; Kuzmich et al. 2003] and quantum optical simu- 
lations of condensed matter phenomena [Greiner et al. 2002; Morrison and Parkins 
2008; Sadler et al. 2006]. Unfortunately, the mathematical description of large atomic 
spin systems is complicated by the fact that the dimension of the Hilbert space JifN 
grows exponentially in the number of atoms N. Realistic simulations of experiments 
quickly become intractable even for atom numbers smaller than iV ~ 10. Current ex- 
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periments, however, often work with atom numbers of more than ~ 10^", meaning 
that direct simulation of these systems is well beyond feasible. Moreover, simula- 
tions over a range ~ 1 — 10 are far from adequate to discern even the qualitative 
behavior that would be expected in the ^ 1 limit. Fortunately, it is often the 
case that experiments involving large spin ensembles respect one or more dynamical 
symmetries that can be exploited to reduce the effective dimension of the ensem- 
ble's Hilbert space. One can then hope to achieve a sufficiently realistic model of 
experiments without an exponentially large description of the system. 

In particular, previous work has focused on the symmetric collective states {ips), 
which are invariant under the permutation of particle labels: Ilij\tps) = iV's)- These 
states span the subspace J^s C J^n, which grows linearly with the number of parti- 
cles, dim{J^s) = Nj + 1. However, in order for Jifs to be an invariant subspace, the 
dynamics of the system must be expressible solely in terms of symmetric processes, 
which are particle permutation invariant, and collective operators, which respect 
the irreducible representation structure of rotations on the spin ensemble. Fortu- 
nately, even within this restrictive class, a wide variety of phenomenon may be ob- 
served, including spin-squeezing [Hald et al. 1999; Kitagawa and Ueda 1993] and 
zero-temperature phase transitions [Morrison and Parkins 2008]. 

In practice, symmetric atomic dynamics are achieved by ensuring that there is 
identical coupling between all the atoms in the ensemble and the electromagnetic 
fields (optical, magnetic, microwave, etc.) used to both drive and observe the system 
[Stockton et al. 2003]. This approximation can be quite good for all of the coherent 
dynamics, because with sufficient laboratory effort, electromagnetic intensities can 
be made homogeneous, ensuring that interactions do not distinguish between dif- 
ferent atoms in the ensemble. However, incoherent dynamics are often beyond the 
experimenter's control. Although most types of decoherence are symmetric, they 
are not generally written using collective operators. Instead they are expressed as 
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identical Lindblad operators for each spin, i.e. 

(7.1) 

The fact that decoherence does not preserve J^s has been well appreciated and 
the standard practice in experiments that address the collective state of atomic en- 
sembles has been either: (i) to model such experiments only in a very short-time 
limit where decoherence can be approximately ignored; or (ii) to use decoherence 
models that do respect the particle symmetry, but which are written using only col- 
lective operators, even when doing so is not necessarily physically justified. In atomic 
spin ensembles, for example, a typical source of decoherence comes from spontaneous 
emission, yet collective radiative processes only occur under specific conditions such 
as superradiance from highly confined atoms [Dicke 1954] and some cavity-QED or 
spin-grating settings [Black et al. 2005]. 

In this chapter, I generalize the collective states of an ensemble of spin- 1/2 parti- 
cles (qubits) to include states that are preserved under symmetric — but not neces- 
sarily collective — transformations. Specifically, I generalize from the strict condition 
of complete permutation invariance to the broader class of states that are indistin- 
guishable across degenerate irreducible representations (irreps) of the rotation group. 
While the representation theory of the rotation group has been utilized in a wide 
variety of contexts, such as to protect quantum information from decoherence by 
encoding it into degenerate irreps with the same total angular momentum [Bacon 
et al. 2001; Lidar et al. 1998], I utilize relevant aspects of the representation theory 
to obtain a reduced-dimensional description of quantum maps that act locally but 
identically on every member of an ensemble of qubits. 

The main result, presented in Eq. (7.45), enables us to represent arbitrary sym- 
metric Lindblad operators in the collective state basis. We find that the dimension of 
the Hilbert space J^q spanned by these generalized collective states scales favorably, 
dim(J^) ~ iV^. This allows for efficient simulation of a broader class of collective 
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spin dynamics and in particular, allows one to consider the effects of decoherence 
on previous simulations of symmetric collective spin states. We note that dynamical 
symmetries for spin-1/2 particles have been studied in the context of decoherence- 
free quantum information processing [Bacon et al. 2001; Lidar et al. 1998]. Unlike 
the work in this chapter, which uses symmetries to find a reduced description of a 
quantum system, these works seek to protect quantum information from decoherence 
by encoding within the degeneracies introduced by dynamical symmetries. 

The remainder of this chapter is organized as follows. Section 7.2 reviews the 
representation theory of the rotation group, which plays an important role in defining 
the symmetries related to rJ^s and Mc- Section 7.3 introduces collective states 
and Section 7.4 defines collective processes over these states. Section 7.5 gives an 
identity for expressing arbitrary symmetric superoperators, e.g. Eq. 7.1, over the 
collective states. Section 7.6 leverages this formalism to compare the effect of different 
decoherence models in non-classical atomic ensemble states. Section 7.7 concludes. 



7.2 General states of the ensemble 

Consider an ensemble of spin-1/2 particles, with the n^^ spin characterized by its 
angular momentum j^"^ = {jx^\jy''\j^z^''}- States of the spin ensemble are elements 
of the composite Hilbert space 

= ^(1) ^(2) ® . . . (7.2) 
with dim{Jf]^) = 2^. Pure states of the ensemble, \ip) G .y^N, are written as 

= 5Z Cmi,m2,...,mM\mi,m2,...,mN) (7.3) 

mi,m,2,...,m]\j 

with rrin = ±| and where 

|mi,m2, . . . ,m^r) = |^,mi)i O 1^,1712)2 ® ■ ■ ■ ® \^,mN)N (7.4) 
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satisfies 

ji''^\mi,m2, . . .,mN) = /im„|mi,m2, . . • ,miv). (7.5) 

When studying the open-system dynamics of the spin ensemble, one must generally 
consider the density operator 

P= Pmi,m2/--,miv;K.'"2---.'"W ^ l"^l''^2, • • • ,"^Ar)(m'i,m2, . . . ,m'^| (7.6) 

mi,m2,.--i"ijv 

States expanded as in Eqs. 7.5 and 7.6 are said to be written in the product basis. 



7.2.1 Representations of the Rotation Group 

For a single spin-1/2 particle, a spatial rotation through the Euler angles R = 
(a,/?, 7) is described by the rotation operator 

R{a, p, 7) = e-'^^^fr^^^'e-"^^^ (7.7) 

The basis kets \\,m) for this particle therefore transform under the rotation R ac- 
cording to 

R\\,m') = J2^l',miR)\l,rn) (7.8) 

m 

where the matrices &^{R) have the elements 

!^i,^ = {\,n^\R{a,P,^)\\,m). (7.9) 

The rotation matrices ^^(R) form a 2— dimensional representation of the rotation 
group. 

For the ensemble of N spin-1/2 particles, each component of the ket {ip) = 
\mi,m2, . . ■ , wiat) transforms separately under a rotation so that an arbitrary state 
transforms as 

1^') = [^"2{R)f^\^). (7.10) 
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The rotation matrices ^{R) = [^2(_R)]®^ provide a reducible representation for the 
rotation group but can be decomposed into irreducible representations (irreps) as 

7 d' 

Jmax "iV 

^(^)= 0^''^(^) • (7.11) 

'-^ '-^m ill 1 

The quantum number i{J) = 1,2, . . . ,dff is used to distinguish between the 

d^- m^j+i) j.<j<j (712) 

degenerate irreps with total angular momentum J [Mihailov 1977]. That is to say, dj^ 
is the number of ways one can combine N spin-1/2 particles to obtain total angular 
momentum J. The matrix elements of a given irrep ^■^'*(i?) 

^HaA^) = {J,M,t\^^Rf''\J,M',t) (7.13) 
are written in terms of the total angular momentum eigenstates 

3^\J,M,i) = J{J +l)\J,M,i) (7.14) 
J^\J,M,i) = M\J,M,i) (7.15) 

with = EiLii^"^ ^max = f and 
1 N odd 

^min = { (7.16) 

even . 

It is important to note that degenerate irreps have identical matrix elements, i.e. 

{J,M,t\^^Rf^\J,M',t) = {J,M,t'\^^Rf\J,M',t') (7.17) 
for all i,i'. 

In this representation, pure states are written as 

■/max J '^N 

\i^)= J2 5Z Zl^-^.Af,|^,M,^) (7.18) 
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and mixed states as 



J 11' 




P 



(7.19) 



J,J' = Jniin M,M' = -J,J' i,i' = l 



States written in the form of Eqs. 7.18 or 7.19 are said to be written in the irrep 
basis. We stress that both the product and irrep bases can describe any arbitrary 
state in Jifjy. 



While the representations in Section 7.2 allow us to express any state of the ensemble 
of spin-1/2 particles, the irrep basis suggests a scenario in which we could restrict 
attention to a much smaller subspace of J^n. In particular, the irrep structure of the 
rotation group, as expressed in Eq. 7.11, indicates that rotations on the ensemble 
do not mix irreps and that degenerate irreps transform identically under a rotation. 

Following this line of reasoning, we introduce the collective states, \ipc)y which 
span the sub-Hilbert space J^c C J^n- Collective states have the property that 
degenerate irreps are identical; for pure states, cj^M,i = cj^M,i' for all i and i'. We note 
that the symmetric collective states mentioned in the introduction are the collective 
states with cj^M,i = unless J = ^ and thus correspond to the largest J value irrep. 
We also note that 



7.3 Collective States 



J max 



dim^= (2J+1) 



j=j, 





if N even 



(7.20) 



Physically, the collective states reflect an inability to address different degenerate 
irreps of the same total J. This new symmetry allows us to effectively ignore the 
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quantum number i and write 

Jmax J '^if 

>^max 



E E \ldlcj,M\J,M) (7.21) 



J— Jniin M—J 

where I have defined effective basis kets 



1 

|J,M) = -=^|J,M,^) (7.22) 
V "TV i=i 

with effective amphtude Cj^m = Cj^M,i for all i (since the Cj^M,i are equal for collective 
states). 

The factor of \/djf serves as normalization, so that we can apply standard spin- J 
operators to the effective kets without explicitly referencing their constituent degen- 
erate irrep kets \J,M,i). In other words, \J,M) actually represents d'j^ degenerate 
kets, each with identical probability amplitude coefficients. But since the matrix 
elements of a spin-J operator are identical for irreps, we need not evaluate them 
individually. 

As an example, consider a rotation operator R which necessarily respects the 
irrep structure of the rotation group. Calculating the expectation value of R by 
expanding the collective state {ipc) in the full irrep basis, we have 

iiJcWc) = E E E^*^,Af,cj',M'/(</,M,z|^|J',M',z') (7.23) 

J,J' M,M' 



E E J2''JM,^''J'^'AJ^M,^\R\J,M',^) (7.24) 

J M,M' i 

E E d'NclMCj,M'{J, M\R\J, M') (7.25) 



J MM' 



where in going from Eq. 7.23 to 7.24, we set J = J' and i = i' since rotation 
group elements do not mix irreps. In reaching Eq. 7.25, I have further used the 
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collective state property that cj^M,i = c j.M^iNi, i' and the rotation irrep property that 
{J,M,i\R\J,M',i) = {J,M,i'\R\J,M',i')yi,i' to drop the index i. 

Equivalently we can evaluate the expectation using the effective basis kets | J, M) 
directly: 

(^cl^l^c) = E E \ldl^/di;cl^CJ,,M'{J,M\R\J',M') (7.26) 

J, J' M,M' 

= E E diclj^cj,M'{J, M\R\J, M') . (7.27) 

J M,M' 

Comparing this to Eq. 7.25 and recalling that cj^m = cj^M,i for all i, we see that the 
effective calculation gives the same result. 

We can similarly define collective state density operators, pc, which have the 
properties that (i) there are no coherences between different irrep blocks and (ii) de- 
generate irrep blocks have identical density matrix elements. The second assumption 
again means we can effectively drop the index i, since pj,M,i;j,M',i ~ PJ,M,i';JM',i' 
any i and i'. This allows us to write 

Pc= E PJ,M;JM'\J^M){J,M'\ (7.28) 

J=J^in AI,M'=-J 

where the effective density matrix elements, written using an overlined outer product, 
are related to the irrep matrix elements via 

Pj,M;J,m'\J^M){J,M'\ := -jJ2PJ<M,i;JM',i\J^M,i){J,M',i\ . (7.29) 

Just as for the effective kets, the normalization factor of dj^ ensures expectations are 
correctly calculated using the standard spin- J operators. The density matrix has 
EiriJ^J + 1)' = + 3)(iV + 2)(iV + 1) elements. 

We stress that the overlined outer product notation is different than naively tak- 
ing the outer product of the effective kets defined in Eq. 7.22. Such an approach 
would involve outer products of kets between different, although degenerate, irreps. 
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Such terms are strictly forbidden by the first property of collective state density oper- 
ators. Instead, one should consider the effective density operator as a representation 
of dfj^ identical copies of a spin- J particle. The overline notation is meant to remind 
the reader that the outer product beneath should only be interpreted using Eq. 7.29 
to relate back to the irrep basis. 

7.4 Collective Processes 

We are now interested in describing quantum processes, C, which preserve collective 
states, p'q = Cpc- Writing this explicitly, we must have 

|Ji,Mi)(Ji,M(| 

£|J2,M2)(J2,M^ 

If we define the action of C on collective density matrix elements as 

fJ,M,M' = C\J,M){J,M'\ 
we immediately see that this action must be expressible as 

fJMM' = J2Y: XjXMi\-JuM,){.h,M[\ (7.32) 

Ji Mi,M[ 

in order for the equality in Eq. 7.30 to be met. Here X'^J^jlf^ m' arbitrary function 
of its indices. Any process which preserves collective states by satisfying Eq. 7.32 is 
a collective process. 

Examples of collective processes are those involving collective angular momen- 
tum operators {J^, J^, ...} and more generally, arbitrary collective operators C = 
^iv^^ -(n)_ gince collective operators correspond to precisely the rotations consid- 
ered when defining the irrep structure of the rotation group, they can all be written 



(7.30) 



(7.31) 
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as 



C^J2Y1 cj,M,M'\J^M){J,M'\ , 

J M,M' 

which cannot couple effective matrix elements with different J. 



(7.33) 



However, the collective operators define a more restrictive class than an arbitrary 
collective process, which can couple different J blocks, so long as it does not create 
coherences between them. In fact, if all operators are collective, then the symmetric 
collective states {\ips)) span an invariant subspace of the map. This holds even when 
considering Lindblad operators that are written in terms of collective operators. 



C[S]p 
where S — J2n ^^"^ 



SpS^ - ^-S^Sp - ^-pS^S 



(7.34) 



In the following section, I demonstrate that a process of the form 



N 



fJ,M,M' ^ ^sW|J,M)(J,M'|(t»)t , 



(7.35) 



n=l 



which cannot be written solely in terms of collective operators, is nonetheless a 
collective process. Moreover, if we expand the operators in the spherical Pauli basis 
via s = s- a and i'^ = t ■ a\ we find 

fJ,M,M' ^ g( j^^ ^/^ jy^ . (7_3g) 

with the tensor g{J,M,M',N) defined as 



N 



g,r{J,M,M\N) = ^a("V,M)(J,M'|(a("))t. 



(7.37) 



n=l 



The tensor is written as a function of N to coincide with the notation in the following 
section. 

Before deriving a closed form expression for g{J,M,M',N), we would like to 
relate it to modeling symmetric decoherence processes, which take the form 



N r 



mp = E 



n=l 



2 2 



(7.38) 
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In order to relate C[s] to Eqs. 7.35 and 7.37, set i = s and expand the single spin 
operator s in the spherical Pauli basis 



Si 



(7.39) 



with the convention h = 1, = (oo))^"- = (lo) and = ( o -^i ) ■ The symmetric 
Lindblad of Eq. 7.38 can be expanded as 



N 



n=l 
N 



n=l 



5(n)p(j(n))t 



-SnP - ^pSn 



(7.40) 



with the collective operator Sn given by 

N 



n=l 
1, 



( 



+ 21^+1+1^^1 



'')Ni 



+ (sIS/ - S*_Sz + S*jS+ + s*s + ) J+ 

+ {s*iS- + s*_^si + s\sz - sls-)J- 



(7.41) 



+ ( 



2 

and Jq = ^n=i '^9"'* ^ collective spin operator. 

In this form, it is clear that only the first term of the symmetric Lindbladian is 
not written using collective operators. In fact, if we again expand s*^"-* in the spherical 
basis, we observe that the only terms which involve non-collective operators are those 
which do not involve the identity operator. 



N 

E 

n=l 



\si\'^Np + "^{sgS^Jqp + sis*gpJl) 



N 



(7.42) 



n=l '- q,r 

The last term here is precisely the tensor evaluation of s- g{J, M, M' , N) ■ s*. We 
now proceed to give an identity for the tensor elements. 
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7.5 Identity 



Identity 1. Given a collective density matrix element for N spin-1/2 particles, 



\J,M){J,M'\, we have 



g,r{J,M,M',N) 



N 



J]a(")|J,M)(J,M'|(<7("))t 



n=l 



1 

'2J 



1 + 



a-j/' 2 J + 1 
< J + 1 



a^'^|j,m,)(j,m;|a 



+ 



^s^^|j-i,m,)(j-i,m;|b, 



J,M' 
r 



a 



J+i 

N 



-D}''\J+ 1, M,)( J + 1, m;|d/'^' 



42(^+1) ' 

where q,r e {+, -, z}, M+ = M + 1, M_ = M - 1 and = M, 



N 
2 



a 



N 



m 



j'=j 



(f-J)!(f + J)! 



(7.43) 
(7.44) 



(7.45) 



(7.46) 



and 



A'^^ = ^J{J -M){J + M + l) 

A^_l^ = ^/{J + M){J-M + 1) 
Ai'"" = M 



(7.47a) 

(7.47b) 
(7.47c) 



and 



= ^/{J - M){J - M -1) (7.48a) 
Bf:^ = -^/{J + M){J + M -1) (7.48b) 
B^'^ = y/{J + M)(J-M) (7.48c) 
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and lastly 

= -^/{J + M + 1){J + M + 2) (7.49a) 

D'!:^ = ^/{J - M +1){J - M + 2) (7.49b) 

Df^ = y/{J + M +1){J - M + 1) . (7.49c) 

Note that afj- and dj^ are zero if J is negative or J = N/2, ensuring that only vahd 
density matrix elements are involved. 

In the following subsections, we prove Identity 1 inductively. The motivation 
for the inductive proof comes from the simple recursive structure of adding spin-1/2 
particles. As seen in Fig. 7.1, the dj^ irreps which correspond to a total spin J 
particle composed of spin-1/2 particles can be split into two groups, depending 
on how angular momentum was added to reach them. By expressing the particle 
states in terms of bipartite states of a single spin-1/2 particle and a spin-(A^ — 1) 
particle, we can then evaluate the dynamics independently on either half by assuming 
Identity 1 holds. Returning the resulting state to the A^ particle basis should then 
confirm the Identity. By inspection, the b of A^ = 1 holds, as the A'^'^ terms 

reduce to the single spin-1/2 matrix elements. We now proceed to the inductive case. 



7.5.1 Recursive state structure 

In order to apply the inductive hypothesis, we need to express an A^ particle state 
in terms of A^ — 1 particle states. This recursive structure is best seen by examining 
Fig. 7.1, which illustrates the branching structure for adding spin-1/2 particles. For 
example, the three-fold degenerate A^ = 4 spin-1 irreps arise from two different spin 
additions — adding a single spin-1/2 particle to the non-degenerate J = |,A^ = 3 
irrep and adding to the 2-fold degenerate J = ^, N = 3 irreps. Since we are always 
adding a spin-1/2 particle, the tree is at most binary. This allows us to recursively 
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N = 1 



2 



3 



4... 



1 X 2 






3 X 1 



1 X 



2x0 



Figure 7.1: Degeneracy structure from adding spin-1/2 particles, labeled as 



decompose the degenerate irreps for a given J in terms of adding a single spin-1/2 
particle to the two related — 1 degenerate irreps. 

Recall that for the collective states, we defined effective density matrix elements 
which group degenerate irreps (Eq. 7.29). In order to make the relationship between 
states of different N clear, in this section we will add the index to all effective 
density matrix elements — | J, M, N){J, M', N\. Similarly, when expressing the collec- 
tive state in the irrep basis, we will also use kets with the index N, i.e. | J, M, N, i). 
Here, the N and i indices indicate the state is from i-th degenerate total spin- J irrep 
that comes from adding N spin-1/2 particles. So that we can leverage the binary 
branching structure seen in Fig. 7.1, we also need to relate the N particle irrep states 
to the — 1 particle irrep states. Accordingly, we define | J, M; ^, J ± ^, N — , 
where the last four entries indicate that the overall A^ spin state can be viewed as 
combining a single spin-1/2 particle with a spin J ± ^ particle. The spin J ± | par- 
ticle is from the Zi-st such irrep for A^ — 1 spin-1/2 particles. With these definitions, 
we can now relate the A^ particle states to the A^ — 1 particle states by explicitly 
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tensoring out a single spin- 1/2 particle: 



J,M, N){J,M',N\ 

jyJ2\J,M,N,z){J,M',N,z\ (7.50) 



1=1 



/+2 

"jv-i 



^ ^ I J, M; ^, J + iV - 1, ^i) ( J, M'; ^, J + ^, iV - 1, I 



d 



"iV-l 



1 11 11 

+ ^ ^|J,M;-,J--,iV-l,Z2)(J,M';-,J--,iV-l,^2| (7.51) 

-'^ 12 = 1 



"iV-1 



mi ,mj 



\J+l,M-m,,N-l){J+^,M'-m[,N- l|^'^'C]f/;^^,_, 



m2,rri2 



21 



|J- -,M-m2,iV-l)(J - -,M'-m' Ar-lK'*^'Cf''^^„ , 



(7.52) 



with Clebsch-Gordan coefficients Cj^'J^^ = ( J, M; ji, j2|ji, '^i; j2, ^^2) and the 
mi,m^ sums over single spin projection values In reaching Eq. 7.52, we made 
use of the definition of the effective density matrix element for — 1 spins given in 
Eq. 7.29. With this recursive state definition, we can now start the inductive step 
of the proof. 



7.5.2 Applying inductive hypothesis 

In order to prove the Identity, we must be able to apply the inductive hypothesis 
to Eq. 7.44. Ignoring the Clesbsch-Gordan coefficients for the moment, consider an 
arbitrary term from Eq. 7.52. The dynamics distribute as 
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N 



(n) 



n=l 



1^, m,) m'J ® I J ± ^, M - m„ iV - 1)( J ± ^, M - m'„ iV - 1| 



a. 



(n) 



gg,(^, m,, m',, 1) ® I J ± ^, M - m,, iV - 1) ( J ± ^, M - m^, iV - 1| (7.53) 



+ |^,mi)(^,m^|®g,,(J±^,M-mi,M'-m^,iV-l) . 



(7.54) 



By extension, all terms in Eq. 7.52 split the dynamics in this manner, which 
allows us to apply the inductive hypothesis to evaluate ggr(^, and gqr{J ± 

|, M — mj, M' — m'-, N — 1) . This means evaluating the ggj. terms according to the 
hypothesis in Eq. 7.45, after which we rewrite the bipartite states in the N spin 
basis. 

We have the gqr{-k,'mi,m[, 1) terms 



'^iV-l J+l 



X] X] 



ii=l Ji=J mi '- 
J+l 

J[=J m[ 



^ J+l,M-mi .J+\,M-mi 



|^i,M,;^,J+^,iV-l,2i) 



/r M'-- 1 ^- N-^ jAJM' r^z'^^'i Ji,M;-r^'"'^^- 4^'™! 



(7.55) 



and the ggr(|,m2,m2, 1) terms 



E E E 



di 



N 



i2 = l J2 = J-1 "12 



4 2'™2J2,M,/^2''^2, J,Mr<2'"^2 
7 1 '-^7 1 



J--i,A4'-m2 



J-i,M-?Tt2 



|J2,M,;^,J-^,iV-l,22) 



X 



J2=J— 1 r7i2 



(J2, -, J - -, 7V - 1, ^2| Oj_i_jv/'-m^, ^J-i.Af'-m^^'- 



(7.56) 



The g,qr{J +\,M - mi, M' - m'l, - 1) terms are 
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J+i 

J[=J ra\ 



J+ 



1 + 



CiN- l 2 J + 2 



-I "^AT-l J+l 



ii=l Ji=J mi 



i ,M-mi ,Mq ^ § ."^1 



(Ji>/;;i,j + i,A'-i,.,K'"''ci;";;„,_, 



J+\,M'-m\ 



(7.57) 



+ 



' 'A-i ^ 

d'ifd]\[-i'^{J + |) n=i Ji=J-i mi 



^ J+ i ,M-mi ^ i ,mi 



J-i,Mq-mi 



J,M^5.'"i 17 /i^.l 7_i - N 



X 



J[=J-1 m\ 



7(,m;^|Xi ^5/+^^'— i (7.58) 

2 ' r '^j^ 



+ 



a 



•^+1 

AT-l 



/+2 

%-i J+2 



,J ,J+f r,/ T , 3\ 
"jV'^A/'-l^W ~r 2/ ^l^-*^ Ji=J+l rrti 



^J+|,M-mi jj_jV^^^i,mi 



J+f ,M„-mi 



J+2 

X E E 

J(=J+1 m'l 



I r M'- - 7 + - AT - 1 iAJM'nl^'^'^ 



and lastly, the gqr{J — ^, M — 1712, M' — N — 1) terms are 



(7.59) 



,7 



1 [1 I '^■^-^^ j 



^ J- i ,M-m2 J2,Mq(jh™-2 

^ J-|,Mq-m2 
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\J2,Mg;-,J - -,N -1,12) 



X 



E E 

J2=J-1 m; 



(7.60) 



+ 



^-1 

N-l 



'^iV-l J-l 



ttE E E 

n J 42 = 1 J2 = J-2 m2 



5, 



J-|,Af-m2j2,M,^|.'»2 



J-f ,M,-m2 



J-l 

X E E 

J2=J— 2 mi, 



j^J-\,M'-ra'^ 



(7.61) 



J+1 

Q'Af-i ^ ^ 

»2 = 1 J2=J m2 



J- 1 ,M-m2 J2 , A/q ^ I 



J.Af^S'^a , , , , 1 , 1 



J+1 



X 



EE 



1^1 ,-.,1^/ 



l^2,M;;-,J+-,iV-l,.,)^--'cr|V^, 



J^,Af;^5'™2 „J-|,Af-mi, 



(7.62) 



7.5.3 Evaluate sums 



We are now tasked with showing that Eqs. 7.55-7.62 sum to ggr(^, M', A^) as 
written in Eq. 7.45. Before doing so, we observe that the Ji,mj and J'i,m[ sums 
factor in all the equations above. Moreover, if one replaces primed quantities with 
unprimed ones, the Clebsch-Gordan and A, B, D coefficients of the kets in a given 
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Ji, rrii sum are identical to those of the bras in the related J'^, m'^ sum. Therefore, we 
focus on simplifying the unprimed sums and then apply those results to the primed 
sums in order to simplify Eqs. 7.55-7.62. In Appendix 7. A, we explicitly calculate 
two representative sums from these equations. The calculations involve manipulating 
products of Clebsch-Gordan and A, B, D coefficients. Although tedious, the inter- 
ested and pertinacious reader should have no trouble evaluating them for all relevant 
sums, finding in particular that the J±2 terms vanish. We forego detailing all those 
manipulations here and simply use the results in both the primed and unprimed 
terms of the equations above, which then simplify Eq. 7.55 to 



1 

<(2J + 2)2^ 

N — 1 

11 11 , (^-63) 

-A^^'^l J, M,; -, J + -, AT - 1, zi)(J + 1, m;; -, J + -, AT - 1, t,\D^''' 

-Df V + 1, M,; ^, J + ^, AT - 1, ^i) ( J, M;; ^, J + ^, AT - 1, A,^'^' 
+Af''\J,M„ 1, J+ l,iV- 1,^i)(J,m;; ^,J+^,N- l,tMr'''' , 



Eq. 7.56 to 



^ If S^^'^l J - 1, M,; ^, J - ^; AT - 1, t,){J - I, M^;^, J - N - 1, 

+A'^'^\J, Mg,^,J-^;N-l, zi) (J - 1, m;; ^, J - ^; AT - 1, 
+5^ V - 1, M,; ^, J - ^; AT - 1, t,){J, M^;^, J - ^; N - I, t,\Ar'^'' 
J, M,; ^, J - ^; AT - 1, t,) (J, Ml; ^,J-In-1,h \Ai^^'' , 

(7.64) 
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Eq. 7.57 to 



3 



a^2J + 2 

L ^ J+i 7 I 3 J 



2 ii=l 



(2 J + 2)2^^ + ^' 2' ^ + 2' ^ ~ ^' '^^^"^ + ^' 2 + 2 ' ^ ~ ^' '^'^^ 
^^^^^ri"^' M,; ^, J + ^, iV - 1, ^,) {J + 1, M;; ^ J + ^, iV - 1, 



(2J + 2)2 ' ' 2' 2' ' ' ' " 2' 2 

^^^D^^l J + 1, M,; i J + i iV - 1, J, M;; ^, J + ^, AT - 1, 

l^^^'^l^, M,; 1, J + iV - 1, (J, M;,; ^ J + 1 iV - 1, ^M'r''' , 

(7.65) 



Eq. 7.58 to 



«iV-l 

X 



J_ 1 ^ 



"JV-I 



^ ( J + 1)5/'*V - 1, M,; ^, J - ^; iV - 1, J - 1, M;; ^, J - ^; iV - 1, 



ii=i 



J, M,; ^, J - ^; iV - 1, zi) (J - I, M^;^, J - N - 1, 
J - 1, M,; ^, J - ^; iV - 1, J, Ml;^,J-^-;N-l, t^lAi'""' 



Eq. 7.59 to (since J + 2 terms vanish) 



(7.66) 



"TV 



J+i 



<2(J+ l)d; 



^ 5^ V + 1, M,; J, J + |, AT - 1, zi) 



n=i 



(J + l,M;;^,J+|iV-l,zi|Z}/'^^' , (7.67) 
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Eq. 7.60 to 



J+I 

1 '^A^— 1 2(7^ 



1^ ^ ..J-l 7 I 1 J 



X 



44J22( J - I) L J 

5f 'V - 1, M,; 1, J - ^; AT - 1, J - 1, 1, J - ^; iV - 1, z^l^/'^' 



-2( J - V - 1, M,; J - -; iV - 1, J, M;; -, J - -; iV - 1, z,\A^ 

+4( J - 'V, M„^,J-^;N-l,t,){J,M^;^,J-^;N- I, tMr'^' , 

(7.68) 



Eq. 7.61 to (since J — 2 terms vanish) 



4"J 

^£'s/'*V-l,M,;i,J-|,Ar-l,.i)(J-l,il<4,^-|,Ar-l,^i|S/>^' 

(7.69) 



dpJdi-'^ " ' ^'^"^'2' 2' ^'^"-'2'" 2' 



and Eq. 7.62 to 



a 



AT-l 



44t|(2J+l) 



7 11 11 

J] + 1, M,; -, J + -, iV - 1, ^i) ( J + 1, M;; -, J + -, iV - 1, 



ii=i 



j + r 



+ j^^f V, M,; ^, J + ^, iV - 1, J + 1, M;; ^, J + ^, iV - 1, z,\D; 
+ j^/^f V + 1, M,; ^, J + ^, iV - 1, ^i) (J, M;- i J + i iV - 1, 



(7.70) 
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7.5.4 Recover g^^( J, M, M', N) 



We now combine the equations from the previous subsection to recover the Identity 
in Eq. 7.45. Given that density operators in the collective state representation lack 
coherences between different J irreps, we expect | J ± 1){J\ and | J)(J ± 1| terms to 
vanish. Since both the | J)(J± 1| and | J± 1)(J| terms have the same coefficients, we 
need only explicitly deal with one of the two. Starting with \J + 1){J\ coefficients 
from Eqs. 7.63, 7.65 and 7.70, we find 

(2J + 3) r a^t|2J + 2. 



"TV 



1 

(2J + 2)2 " (2J + l)(2J + 2)2 



[1 + 



] 



+ 



a 



N-l 



iJ + i){2J + i)yH 



N-l 



1 



"<(2J + 2)2 

: 



(A^+1) A^ + 2J + 2 



2 J 



2 J 



(7.71) 
(7.72) 



Similarly, for |J — 1)(J| coefficients in Eqs. 7.64, 7.66 and 7.68, we have 



1 



1 + 



J+5 
^N-l^J + 2' 



[1 + 



a 



N-l 



2J 



y 2 J 



r] 







(7.73) 



Turning to J + 1 terms from Eqs. 7.63, 7.65 and 7.70, the coefficients sum to 



"TV 



(2J + 2)2 (2J + l)(2J + 2)2 



[1 



J+- 
ajv4 2J + 2 . 

3 J 



d 



■J+-2 J + 



N-l 



+ 



J 



a 



N-l 



1 A^ + 1 



1 + 



4(2J + 2)2V^ ' (2J + 3)(2J+1) 
1 2J + A^ + 4 



+ 



J(A^ + 2J + 2) 
2J+ 1 



<8J2 + 20J+ 12 



a 



J+i 

N 



(7.74) 



Chapter 7. Model Reduction of Collective Qubit Dynamics 



which gives overall 



a 



J+i 

N 



<2(J+l)d-^ 



1 11 
^ J2 V + l,M,;-,J+-,N-l,z^) 



X (J+l,M;;^,J+^,iV-l,zi|Z}/'*^ 



The J terms from Eqs. 7.63, 7.65 and 7.70 have coefficients 



1 



{2J + 3f 



<V(2^ + 2)2 (2J+l)(2J + 2)2 
1 



r a^2J + 2. 



d 



N-l 



+ 



a 



Af-l 



J(J+1)(2J + 1) ^^+1 



1 



(i}(,(2J + 2)2 



1 + 



(Ar + l)(2J + 3) iV + 2J + 2 



2J + 1 



J(2J+ 1) 



"<2J 



1 + 



a 



J+i 

AT 



2J+ 1 



< J +1 



which gives overall 



1 

2J 



1 + 



a 



J+i 

TV 



2J+ 1 



dj^ J + 1 



X 



1 1 1 

^ J]A^^|J,M,;-,J+-,iV-l,^i) 

12 = 1 



X (J,M;-^,J+^,iV-l,^i|A 



J,M 



Similarly, the J terms from Eqs. 7.64, 7.66 and 7.68 have coefficients 
1 



<4J2 



1 



J+- 



+ 2(J--)[1 + 



a 



J+I 

Af-l 



2J 



J-\ J + 



d 



t] 



N-l 



'di,AJ^ 



'dif2J 



N-2J, 1 



1 + 



a 



J+i 

N 



2J+ 1 



< J+1 
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which gives 



1 

2J 



1 + 



a 



J+i 

N 



2J+ 1" 



djf J+1 



X 



"iV-l 



7^ 5Z ^9 



X (J,M;;i,J-^,iV-l,^i|A,^'^^' 



(7.81) 



And finally, the J — 1 sums from Eqs. 7.64, 7.66 and 7.68 have coefficients 



1 + 



a 



J+h 

N- 



\2J{J + 1] 



"at 



+ 



2(^-2 

1 

<4J2 



2 J 



1 + 



■^-5 J- 
TV-l 

1 

+ 



r] 



N 



2J-1 2 J 



rr(^+^ + 273i) 



AT 



(7.82) 



which gives 



n=i 



^-l,M,;^,J-^,iV-l,2i) 

X (J-l,M;;^,J-^,iV-l,zi|5^ 



(7.83) 



From the definition of \J, M, N){J, M' , N\ given in Eq. 7.29, we see that Eqs. 
7.79 and 7.81 correspond to the \ J, M, N){J, M' , N\ terms in Eq. 7.45. A similar 
combination of Eqs. 7.69 and 7.83 corresponds to the J— 1 term and the combination 
of Eqs. 7.67 and 7.76 corresponds to the J term. We have thus shown inductively 
that Identity 1 holds. □ 
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7.6 Examples 

As discussed in the introduction, reahstic decoherence models for an ensemble of 
spin particles are often described most aptly by a symmetric sum over local channels. 
Consider, for example, the open system dynamics governed by the master equation 

^ = -^[H,m]+rmm, (7.84) 

where H (and any measurements performed) are described by collective operators, 
but the decoherence involves the symmetric Linblad superoperator C[s] of the form 
in Eq. (7.38). As this decoherence model does not preserve symmetric states, it 
has been common practice to consider instead the associated collective process C[S] 
given in Eq. (7.34) with S = J2n^^"'^- 

To illustrate the difference between symmetric and collective decoherence models, 
consider the open system dynamics of two representative problems. First, com- 
pare the dynamics generated by the symmetric-local C[s] versus collective C[S] 
Linblad master equations applied to an initial superposition (cat) state \ip{0)) = 
(If ,+f ) + If ,-f )) /V2. Figure 7.2(a-b) depicts the fidelity ^(t) = {4j{0)\p{t)\^{0) 
evolved under Eq. (7.84) (with H = 0) for two different types of decoherence chan- 
nels: Fig. 7.2(al-a2) compares the collective versus symmetric master equations with 
s = (7- for = 10 and A^ = 100 particles, respectively; and Fig. 7.2(bl-b2) makes a 
similar comparison for s = a^. The examples considered (including some not reported 
here) suggest symmetric local decoherence models can generate dynamics that are 
appreciably different from their collective analogs. This is perhaps not too surpris- 
ing: for an initially symmetric state, collective decoherence models C[S] confine the 
dynamics to only maximum- J irrep; symmetric local models C[s\ do not necessary 
preserve the irrep decomposition of the initial state. Fig. 7.2(c) depicts the norm 
of each total-J irrep block of the density operator Nj = tr[Pjp(t)] as a function of 



time for £|o"_] (Pj = | J, M)( J, M|). The observation that small- J irreps are 
only minimally populated suggests that further model reduction by truncating the 
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Decoherence Time ( Ft ) Decoherence Time ( F! ) 

Figure 7.2: Decoherence of tlie initial superposition state \ip{0)) = + + 
| — y))/v^: (al-a2) time-dependent fidelity with the initial state for both sym- 
metric local /^[cT-] and collective /^[J-] decoherence for different numbers of par- 
ticles; (bl-b2) similar comparison for £[(Tz] versus C[Jz]', (c) time-dependent pop- 
ulations of different total- J irreps for /^[(3"_]. 



Hilbert space to only the largest J blocks could be beneficial. 

As a second example, consider comparing symmetric-local versus collective deco- 
herence models applied to dynamically-generated spin squeezing under the counter- 
twisting Hamiltonian H = —iA{J^ — J^) [Kitagawa and Ueda 1993]. I performed 
simulations by time-evolving Eq. (7.84) from the initial spin-coherent state |y, y) 
for = 100 with £[(T_] and jC[J^]. Figure 7.3 depicts the time-dependent squeezing 
parameter = N{AJy) / (Jz)"^, each for T = A/5,A,5A. Under the conditions con- 
sidered, symmetric local decoherence wave evidently less destructive to the squeezing 
dynamics than collective models. As observed for the cat-state dynamics, to large 
extent the main effect of symmetric-local decoherence is leakage from the maximum 
J irrep. But since the driving Hamiltonian H involves only collective spin opera- 
tors, the coherent dynamics decouple for different total J: the population in each 
irrep block then undergoes its own squeezing, evidently making the dynamics more 
resistant to symmetric local decoherence than collective processes. 
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Figure 7.3: Time-evolution of the squeezing parameter ^ for a spin ensemble 
driven by -ff = —ih.{J\ — J^) subject to C[a-.] (dotted lines) and C[J-] (dashed 
lines) with relative decoherence rates F = A/5,A,5A. For comparison, the solid 
line denotes decoherence-free squeezing. 



7.7 Summary 



I have presented an exact formula for efficiently expressing symmetric processes of 
an ensemble of spin-1/2 particles. The efficiency is achieved by generalizing the 
notion of collective spin states to be any such state which does not distinguish de- 
generate irreps. For a collection of spin-1/2 particles, the effective Hilbert space 
dimension grows as A^^, a drastic reduction from the full Hilbert space scaling of 
2^. The collective representation is used in Identity 1, which gives a closed-form 
expression for evaluating non-collective terms from symmetric Lindblad operators. 
Simulations confirm that symmetric local decoherence models can be drastically dif- 
ferent than collective decoherence models. Unfortunately, due to the complicated 
structure of adding spin- J > | particles [Mihailov 1977], these results do not appear 
to generalize. Nonetheless, I believe that this approach will become a useful tool in 
analyzing collective spin phenomenon and in particular, accurately considering the 
role of decoherence in collective spin experiments. 
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7. A Explicit Simplification of Typical Sums 

In 7.5.3, we simplify the sums in Eqs. 7.55-7.62 but do not go through the detailed 
algebra. The work involves manipulating products of Clebsch-Gordan and A,B,D 
coefficients. In this appendix, I explicitly calculate two representative sums from this 
set and invite the reader to calculate the remainder in a similar fashion. 

First, consider the sums over Ji and nii in Eq. 7.55, which is representative of 
sums in Eqs. 7.55 and 7.56. For Ji = J + 1 



11 11 

7 , 1 



J,M ^ 2 ' 2 



J+i,A/-i 



J+\,M-\ 



2(J+1) 



1 



v/(J + M + 2)(J + M + l) q = + 

(l = - 



^{J - M + 2){J - M + 1) 
y(J-M + l)(J + M + l) 



2 J + 2^" 
and for Ji = J 



J,M 



q = z 



(7.85) 



(7.86) 



1 1 



11 11 

A2'2J,Mqi^2'2q J,M^2'2 

^1 ^J+lM^\ 



2 ' 2 



1 



1 



-r ^q '-^ 7 , 1 ».f , 1 '-^7,1 



J+|,A/+| 



2(J 



^/{J^'M){JTWT1) q = + 
y/{J + M){J-M + l) q = - 
M q = z 



(7.87) 



2J + 2 " 



(7.88) 



1 1 

where AV^ 



1 _ 1 
Al' 5 



0, A 



1 _ 1 

2 ' 2 



1 1 

Al'^ 



1 and Al 



Similarly, consider the sums over Ji and mi in Eq. 7.58, which is representative 
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of Eqs. 7.57-7.62. For Ji = J — 1, we have 



1 



B, 



J-\,Mq+\ J+\,M+\ 



4J(J+1) 



I 2' 



1 + 



J-M + l^^+l-Af-l 



'(J-Mq)(J-M + l) ^j+iA/^i 2(J+1) 



I 2' 



4J(J+1) J-M + 1 



J+1 
J 



-^{J + M){J + M-l) q = - 
^^{J + M){J - M) q = z 



Similarly, for Ji = J, we have 



J+i,A/-i 



1 1 

' 2 



J.M 



'(J + M,)(J-M + l)^j+i,Af-i 



4J(J+1) 



1 - 



J-M„ J + M+IB, 



'(J + M,)(J-M + 1) „J+lM-i 



4J(J+1) 
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J(J + 1) 



^y{J-M){J + M+l) g = + 
^y{J + M){J-M + l) g = - 
M q = z 



J(J+1)^^ 



J,M 



(7.90) 
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Appendix A 



Riccati Equations 



The ability to solve matrix Riccati equations is an important tool when using the 
Kalman filter given in Theorem 2.8. Following [Reid 1972; Stockton et al. 2004], I 
review a technique for reducing the nonlinear system into a set of linear differential 
equations. Consider the matrix Z(t) which satisfies the Riccati equation 

^ = A{t)Z-ZD{t)-ZC{t)Z + B{t). (A.l) 

Instead of solving this directly introduce the decomposition Z{t) = X(t)Y^^{t) and 
solve the equivalent linear system 



dX{t) 

dt 
dY{t) 

dt 



A{t) B{t) 
C{t) D{t) 



X{t) 
Y{t) 



(A.2) 



To check that this decomposition satisfies the original Riccati equation, we simply 
calculate 



dZ 
~dt 



dt dt 



~di 



-Y 



-1 



-XY~^{C{t)X + D{t)Y)Y-^ + {A{t)X + B{t)Y)Y-^ 
-ZC{t)Z - ZD{t) + A{t)Z + B{t) 



(A3) 
(A.4) 

(A5) 
(A.6) 
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Appendix B 

Numerical Methods for Stochastic 

Differential Equations 



Many of the filters and SDEs in this thesis do not admit an analytic solution. As such, 
it is useful to have methods for numerically simulating or integrating a stochastic 
system. An excellent resource for such methods is the text by Kloeden and Platen 
[1999], in which the following two integrators are discussed in more detail. In the 
following, I consider the n-dimensional stochastic process Xt and the m-dimensional 
Wiener process dWf related via the SDE 

m 

dXt = a{t, Xt)dt + ^ V{t, Xt)dWi. (B.l) 

The first integrator is the Euler or Euler-Maruyama scheme and is the trivial ex- 
tension of the standard Euler method for integrating ordinary differential equations. 
We begin by discretizing the time-domain in terms of a step-size At. The integrator 
then estimates the state at times = + by stepping the state forward via the 
SDE. The Euler approximation for the /c-th entry of Xt at time-step given the 
state at timestep is then given by 

m 

Xl^,=Xl+a\u.,Xt^)At + Y,h'^'{tn.K^.)^W^ (B.2) 

where AW^ is a pseudo-random number with mean zero and variance At. This 
simplicity of this approach is a clear advantage, but its order of convergence is 0.5, 
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meaning \\Xt^ — Xt„\\2,F < a{Aty^'^ where a is some constant independent of At. 
Note that this is worse than the Euler method for ODEs, which is order 1.0. 

The other method used significantly in simulations for this thesis is an order 2.0 
weak predictor-corrector method, which offers improved stability and convergence at 
the cost of more computational complexity. For simplicity, restrict consideration to 
a single noise term m = 1 and time-independent a and b; see Kloeden and Platen 
[1999, Chapt. 15] for the multiple noise version. The estimate is then given by 




1 



(B.3) 



where 




1 



where the supporting values are given by 




(B.5) 



and with predictor 




1 




with supporting value 



T = X 



+ a(X,J At + 6(X,J AW 
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Appendix C 



Stochastic Schrodinger Equation 



Lacking any extra sources of decoherence, pure states remain pure under the dynam- 
ics described by the quantum filtering equation. As such, it is often convenient for 
analysis and simulation to have a pure state description of the dynamics in terms of 
a stochastic Schrodinger equation (SSE). In this appendix, I briefly derive the SSE 
for the general adjoint filter 



dpt = -i[H,pt]dt + [LptD - ^V^Lpt - \ptL'^L] dt 




(C.l) 



We begin by writing 



d\iP)t ^ A\'4;)tdt + B\iP)tdWt 
d{^jj\t^{^lj\tAUt + {^jj\tB''dWt 



(C.2) 
(C.3) 



From the Ito rules, we have 



d{pt) 



mtdmt) + d{\i;)tmt + dm)dmt) 

{Apt + ptA^)dt + {Bpt + ptB'^)dWt + BptB^t 



(C.4) 



Comparing the coefficients to the quantum filtering equation, we read off 



B = L-{L) 
B^^L^- (Lt> 



(C.5) 
(C.6) 
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so that 

BptB^ = LptLt - (Lt> Lpt - (L) p^L^ + (L) (L^) (C.7) 
We try setting 

A = -i(LtL-2(Lt>L+(L)(Lt» (C.8) 
which means that 

Ap, + p,^t ^ _ 1 (^t^ _ 2 (Lt> L + (L) (Lt» 

-p,i(LtL-2(L)Lt + (L)(Lt» (C.9) 

^-h^Lpt-^PtL^L+{L^)Lpt 

+ PtL^L) - {L) {L"^) p, (C.IO) 

and therefore 

Apt + pi^t + ^p^^t ^ -iLtLpi - ^p,LtL 

+ (Lt)Lp< + PtLt (L) 
-(L) (Lt>pi + LptLt 

- (Lt> Lpt - (L) ptL'^ + (L) (Lt> p, (C.ll) 

= Lp,Lt - ^LtLp, - ^PiLtL (C.12) 

which is the deterministic part of the quantum filtering equation as desired. 
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